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Preface 



This book provides an introductory, practical and illustrative guide to the design of exper- 
iments and data analysis in the biological and agricultural plant sciences. It is aimed both 
at research scientists and at students (from final year undergraduafe level fhrough faughf 
masfers fo PhD sfudenfs) who eifher need fo design fheir own experimenfs and perform 
fheir own analyses or can consul! wifh a professional applied sfafistician and wanf fo have 
a clear undersfanding of fhe mefhods fhaf fhey are using. The maferial is based on courses 
developed at two British research institutes (Rothamsted Research and Horticulture 
Research International [HRI - then Warwick HRI, and now the School of Life Sciences, 
Universify of Warwick]) fo frain research scientisfs and posf-graduafe sfudenfs in fhese 
key areas of sfafisfics. Our overall approach is infended fo be pracfical and infuitive rafher 
fhan overly fheorefical, wifh mafhemafical formulae presenfed only fo formalize fhe mefh- 
ods where appropriafe and necessary. Our infenfion is fo presenf sfafisfical ideas in fhe 
confexf of fhe biological and agriculfural sciences fo which fhey are being applied, draw- 
ing on relevant examples from our own experiences as consulfanf applied sfafisficians af 
research insfifufes, fo encourage besf pracfice in design and dafa analysis. 

The firs! fwo chapfers of fhis book provide introductory, review and background mate- 
rial. In Chapter 1, we introduce types of dafa and sfafisfical models, fogefher wifh an 
overview of fhe basic sfafisfical concepfs and fhe ferminology used fhroughouf. The frain- 
ing courses on which fhis book is based are infended fo follow preliminary courses fhaf 
infroduce fhe basic ideas of summary sfafistics, simple sfafisfical disfribufions (Normal, 
Poisson, Binomial), confidence infervals, and simple sfafisfical fesfs (including fhe f-fesf 
and F-fesf). Whilsf a brief review of such maferial is covered in Chapter 2, fhe reader will 
need fo be comforfable wifh fhese ideas fo reap fhe greafesf benefif from reading fhe resf of 
fhe book. Some readers may feel fhaf fheir knowledge of basic sfafisfics is sufficiently com- 
prehensive that they can skip this review chapter. However, we recommend you browse 
through it to familiarize yourself wifh fhe sfafisfical ferminology fhaf we use. 

The main body of fhe book follows. Chapfers 3 fo 11 infroduce sfafisfical approaches fo 
fhe design of experimenfs and fhe analysis of dafa from such designed experimenfs. We 
sfarf from basic design principles, infroduce some simple designs, and then extend to more 
complex ones including factorial treatment structures, treatment contrasts and blocking 
structures. We describe the use of analysis of variance (ANOVA) fo summarize fhe dafa, 
including fhe use of fhe mulfi-sfrafum ANOVA fo account for fhe physical sfrucfure of 
fhe experimenfal maferial or blocking imposed by fhe experimenfer, infroduce simple 
diagnostic methods, and discuss potential transformations of fhe response. We explain 
fhe analysis of sfandard designs, including fhe randomized complete block, Lafin square, 
splif-plof and balanced incomplefe block designs in some defail. We also explore fhe issues 
of sample size esfimafion and fhe power of a design. Finally, we look af fhe analysis of 
unbalanced or non-orfhogonal designs. Chapfers 12 fo 18 firs! infroduce fhe idea of simple 
linear regression fo relate a response variable fo a single explanafory variable, and fhen 
consider extensions and modificafions of fhis approach fo cope wifh more complex dafa 
sefs and relationships. These include multiple linear regression, simple linear regression 
with groups, linear mixed models and models for curved relafionships. We also exfend 
relafed fhemes from fhe earlier chapfers, including diagnosfic mefhods specific fo regres- 
sion. We emphasize fhroughout fhaf fhe same type of models and principles are used for 
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both designed experiments and regression modelling. We complete the main body of the 
book with a discussion of generalized linear models, which are appropriafe for cerfain 
fypes of non-Normal dafa. 

We conclude wifh a guide fo pracfical design and dafa analysis (Chapfer 19), which 
focuses on fhe selecfion of fhe mosf appropriafe design or analysis approach for individual 
scienfific problems and on fhe inferprefafion and presenfafion of fhe resulfs of fhe analysis. 

Mosf chapfers include exercises which we hope will help fo consolidafe fhe ideas infro- 
duced in fhe chapfer. In running fhe fraining courses from which fhis book has been devel- 
oped, we offen find fhaf if is only when sfudenfs perform fhe analyses fhemselves fhaf 
fhey fully appreciafe fhe sfafisfical concepfs and, mosf imporfanfly, undersfand how fo 
inferpref fhe resulfs of fhe analyses. We fherefore encourage you fo work fhrough af leasf 
some of fhe exercises for each chapfer before moving fo fhe nexf one. There are fewer exer- 
cises in fhe earlier chapfers and fhe required analyses build in complexify, so we expecf 
you fo apply knowledge gained fhroughouf fhe book when doing exercises from fhe lafer 
chapfers. All of fhe dafa sefs and solufions fo selecfed exercises are available online. Some 
of fhe solufions include furfher discussion of fhe relevanf sfafisfical issues. 

We have sef up a websife fo accompany fhis book (www.sfafs4biol.info) where we show 
how fo do fhe analyses described in fhe book using GenSfaf®, R and SAS®, fhree commonly 
used sfafisfical packages. Whilsf users familiar wifh any of fhese packages mighf nof refer 
fo fhis maferial, ofhers are encouraged fo review if and work fhrough fhe examples and 
exercises for af leasf one of fhe packages. Any errors found affer publicafion will also be 
recorded on fhis websife. 

By fhe fime you reach fhe end of fhe book (and online maferial) we infend fhaf you will 
have gained 

• A clear appreciafion of fhe imporfance of a sfafisfical approach fo fhe design of 
your experimenfs, 

• A sound undersfanding of fhe sfafisfical mefhods used fo analyse dafa obfained 
from designed experimenfs and of fhe regression approaches used fo consfrucf 
simple models fo describe fhe observed response as a funcfion of explanatory 
variables, 

• Sufficienf knowledge of how fo use one or more sfafisfical packages fo analyse 
dafa using fhe approaches fhaf we describe, and mosf imporfanfly, 

• An appreciafion of how fo inferpref fhe resulfs of fhese sfafisfical analyses in fhe 
confexf of fhe biological or agriculfural science wifhin which you are working. 

By doing so, you will be beffer able bofh fo inferacf wifh a consulfanf sfafisfician, should 
you have access fo one, and fo idenfify suifable sfafisfical approaches fo add value fo your 
scienfific research. 

This book relies heavily on fhe use of real dafa sefs and maferial from fhe original courses 
and we are hence indebfed fo many people for fheir inpuf. Parficular fhanks go fo Stephen 
Powers and Rodger Whife (Rofhamsfed Research) and John Fenlon, Gail Kingswell 
and Julie Jones (HRI) for fheir confribufions fo fhe original courses; also fo Alan Todd 
(Rofhamsfed Research) for providing many valuable suggesfions for suifable dafa sefs. 
The majorify of real dafa sefs used arose from projecfs (including PhDs) af Rofhamsfed 
Research, many in collaborafion wifh ofher insfifufes and funded from many sources; 
we fhank Rofhamsfed Research for giving us general permission fo use fhese dafa. We 
also fhank, in alphabefical order, R. Alarcon-Reverfe, S. Amoah, J. Baversfock, P. Brookes, 
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J. Chapman, R. Curtis, I. Denholm, N. Evans, A. Ferguson, S. Foster, M. Glendining, K. 
Hammond-Kosack, R. Harrington, Y. Huang, R. Hull, J. Jenkyn, H.-C. Jing, A.E. Johnston, 
A. Karp, J. Logan, J. Lucas, R Lutman, A. Macdonald, S. McGrath, T. Miller, S. Moss, J. Pell, 
R. Plumb, P Poulton, A. Salisbury, T. Scott, I. Shield, G. Shortall, L. Smart, M. Torrance, P. 
Wells, M. Wilkinson and E. Wright, for specific permission fo use dafa from fheir own 
projecfs or from fhose underfaken wifhin fheir group or deparfmenf af Rofhamsfed. 
Rofhamsfed Research receives granf-aided supporf from fhe Biofechnology and Biological 
Sciences Research Gouncil of fhe Unifed Kingdom. We fhank various colleagues, pasf 
and presenf, af Horficulfure Research Infernafional, Warwick HRI and fhe School of Life 
Sciences, Universify of Warwick, for permission fo use dafa from fheir research projecfs, 
parficularly Rosemary Gollier and John Glarkson. We fhank M. Heard (Genfre for Ecology 
and Hydrology), A. Orfega Z. (Universidad Ausfral de Ghile) and R. Websfer for per- 
mission fo use dafa. Examples and exercises marked '*' use simulafed dafa inspired by 
experimenfs carried ouf af Rofhamsfed Research or HRI. The small remainder of original 
examples and exercises (also marked '*') were invenfed by fhe aufhors buf are fypical of 
fhe type of experimenfs we are regularly asked fo design and fhe dafa we analyse as parf 
of our consulfancy work. In fhe few cases where we have nof been able fo find examples 
from our own work we have drawn on dafa from published sources. We would like fo 
fhank Simon Harding for fechnical help in setting up a repository for our work and our 
website and Richard Websfer, Alice Milne, Nick Galwey, James Bell and Kafhy Ruggeiro 
and an anonymous referee for reading draff chapfers and providing many helpful com- 
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Finally, we would like fo make some individual acknowledgemenfs. SJW, SJG and SAG 
fhank Rofhamsfed Research, and in parficular Ghris Rawlings, for supporf and encour- 
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Introduction 



This book is about the design of experiments and the analysis of dafa arising in biological 
and agriculfural sciences, using fhe sfafisfical fechniques of analysis of variance (ANOVA) 
and regression modelling. These fechniques are appropriafe for analysis of many (alfhough 
nof all) scienfific sfudies and form an imporfanf basic componenf of fhe sfafisfician's fool- 
box. Alfhough we provide some of fhe mafhemafical formulae associafed wifh fhese fech- 
niques, we have also fried fo inferpref fhe equafions in words and fo give insighf info fhe 
underlying principles. We hope fhaf fhis will make fhese useful sfafisfical mefhods more 
accessible. 

This chapfer presenfs an infroducfion fo fhe differenf fypes of dafa and sfafisfical mod- 
els fhaf are considered in fhis book, fogefher wifh an overview of fhe basic sfafisfical con- 
cepfs and ferminology which will be used fhroughouf. In parficular, we discuss 

• Types of scienfific sfudy 

• Populafions and samples 

• Mafhemafical and sfafisfical models used fo describe biological processes 

• The linear model - which underlies all fhe models and mefhods infroduced in fhis 
book 

• Paramefer esfimafion and sfafisfical inference 

• ANOVA - fhe major sfafisfical fool used fo evaluafe and summarize linear models 
Af fhe end of fhis chapfer, we preview fhe confenfs of the remaining chapters. 



1.1 Different Types of Scientific Study 

We shall be concerned with data arising from bofh experimenfal and observafional sfud- 
ies. Alfhough fhey have many common feafures, fhere are some subfle differences fhaf 
influence fhe conclusions fhaf can be drawn from fhe analyses of dafa from fhese fwo 
fypes of sfudy. 

An experimental study is a scientific fesf (or a series of fesfs) conducfed with the objec- 
tive of sfudying fhe relafionship befween one or more oufcome variables and one or more 
condifion variables fhaf are infenfionally manipulafed fo observe how changing fhese 
condifions affecfs fhe resulfs. The oufcome of a sfudy will also depend on fhe wider 
environment, and the scientist will endeavour to control other variables that may affect 
the outcomes, although there is always the possibility that uncontrolled, perhaps unex- 
pected, variables also influence fhe oufcome. Adequafe planning is fherefore crucial fo 
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experimental success. There are a few key elements that need to be clearly specified and 
considered for an experimental study (Kuehl, 2000). These are the 

• aims of the experiment - usually expressed as questions or hypotheses 

• physical structure of the study materials 

• subjects or entities to be used 

• set of conditions to be investigated 

• other variables that might affect the outcome 

• outcome variables to be measured 

• protocols that define how the measurements are taken 

• available resources (e.g. money, time, personnel, equipment, materials) 

The aims of an experimental study need to be clearly specified, often in the form of 
hypotheses to be tested or a set of questions to be answered; this is a vital part of the 
planning process. The physical structure and subjects to be used should be chosen so 
that the results of the experiment can be related to a wider context (see Section 1.2). In 
addition, the set of conditions to be investigated must be chosen to answer directly the 
scientific questions. Other variables likely to affect the outcome should be identified and 
evaluated so that they can be controlled, as far as possible, and therefore do not interfere 
with the measured outcome. If they cannot be controlled then they should be measured. 
Consideration of the variables to be measured is often overlooked at the planning stage, 
but is important because it may affect both the statistical analysis and the efficiency of 
the design. As discussed later (Chapter 18), the analysis required for binary data (e.g. 
absence or presence of disease) or count data (e.g. numbers of insects or weeds present) 
may be different from that for a continuous variable (e.g. shoot length). A full defini- 
tion of measurement protocols is good practice and should reduce differences in proce- 
dure between scientists working on the same experiment, and improve repeatability of 
the results. Finally, the resources available will usually limit the size and scope of the 
experiment. 

Design of experiments is a process that brings together all the elements above to pro- 
duce an experiment that efficiently answers the questions of interest and aims to obtain 
the maximum amount of information for the resources available, or to minimize the 
resources needed to obtain the information desired. The main statistical principles used 
in constructing a good design are replication, randomization and blocking. These concepts 
are discussed in detail in Chapter 3. 

An observational study differs from an experimental study in that the application of 
conditions that affect the outcome is not directly controlled by the scientist. However, 
all the elements listed above for experimental studies should still be considered when 
you plan an observational study, although opportunities for the random allocation of 
conditions to subjects will be limited and sometimes non-existent. In observational stud- 
ies, the set of conditions to be investigated is first defined, and then subjects with these 
characteristics are sought and measurements made. Observational studies are often used 
in ecology where it is difficult to set up an experiment whilst retaining natural habitats. 
For example, a study might aim to determine the difference in beetle populations using 
selected field margins as the subjects under two conditions: with and without hedges. In 
this context, it is harder than in experimental studies to control other variables that may 
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affect the outcome. For example, the set of hedges available may be composed of several 
plant types, which might in turn affect the species and abundance of beetles present. In 
addition, hedges are already in place, and fields with hedges may differ systematically 
in other characteristics from fields without hedges - in an extreme case they might be 
on different farms, with different farming methods used. The scientist should therefore 
consider that differences between conditions in an observational study might be caused 
by other unrecorded, or possibly unobserved, variables. In experimental studies, where 
we have greater control over conditions, this can still be true, but we can use randomiza- 
tion to guard against such unknown differences between subjects. But where there are 
potential uncontrolled sources of variability, the scientist should be wary of inferring 
direct causal relationships. Hill (1965) gave criteria that should be satisfied by a causative 
relationship in the context of epidemiology, and many of these criteria can be applied 
more widely and may be helpful in deciding whether a causal relationship is plausible for 
any observational study. 

The separation between experimental and observational studies is not complete, as 
some studies may have both experimental and observational components. However, both 
types of study incorporate structure, and we should take account of this structure in the 
planning, design, statistical analysis and interpretation of such studies. 



1.2 Relating Sample Results to More General Populations 

For most scientific studies there is an implicit assumption that the results obtained can 
be applied to a population of subjects wider than those included in the study, i.e. that the 
conclusions will apply more generally (although usually with caveats) to the real world. 
For example, in a field trial to investigate disease control it will generally not be possible to 
have very large plots, nor to assess visually every plant in a plot, and so a random sample 
of plants is selected from each plot. It is assumed that the sampled plants are representa- 
tive of all the plants in the plot and so the results from the sample are inferred to apply to 
the whole plot. In turn, we should usually have several plots within the trial with the same 
treatment applied and hope to infer the results from this sample of plots to the whole field. 
However, it is well-known that field experiment results vary markedly over years and 
locations, so the trial would ideally be performed at several locations across several years 
to provide a representative sample of environments. The combined results from the whole 
set of trials can then be claimed to apply to the region where they were carried out, rather 
than to a single field in a single year. 

In planning any scientific study, it is therefore important to consider the frame of refer- 
ence when experimental subjects are selected. The scientist should identify the population 
(wider group of subjects) to which they hope the experimental results will apply. Ideally, 
the subjects should then consist of a sample, or subset, drawn from this population. If the 
process of selecting a sample, known as sampling, is made at random, then it is reasonable 
to assume that the sample will have similar properties to the whole population, and we 
can use it to make statistical inferences about the population. Generally, as the number of 
observations in the sample increases, the inferences made about the population become 
more secure. If a sample is not taken at random, then this sense of the sample being repre- 
sentative of the population may be lost. 
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1.3 Constructing Models to Represent Reality 

A model is an abstract representation of a hypothesized process that underpins a biologi- 
cal or physical phenomenon, that is, a way of describing a real sysfem in words, diagrams, 
mafhemafical funcfions, or as a physical represenfafion. In biology, models usually cor- 
respond fo a simplificafion of fhe real process, as no exisfing model can represenf realify 
in all defails. However, fhis does nof mean fhaf models cannof be useful. A good model 
summarizes fhe major factors affecfing a process fo give a represenfafion fhaf provides fhe 
level of defail required for fhe objecfive of a parficular sfudy. 

Mathematical models use mathematical notation and expressions to describe a process. 
A statistical model is a mathematical model that allows for variabilify in fhe process fhaf 
may arise from sampling variafion, biological variafion befween individuals, inaccuracies 
in measuremenf or influenfial variables being omitted (knowingly or nof) from fhe model. 
Therefore, any sfafisfical model has a measure of uncerfainfy associafed wifh if. 

Models are addifionally offen classified as eifher process (or mechanisfic) models or 
empirical models. A process model purporfs fo give a descripfion of fhe real underlying 
process. This type of model can be useful in fesfing our knowledge: if a model can be builf 
fo reproduce fhe behaviour of fhe sysfem accurately, fhen our knowledge of fhe process 
(fheory) is af leasf consisfenf wifh realify. Conversely, and arguably more usefully, failure 
of a process model may indicafe gaps in knowledge fhaf can be pursued by furfher experi- 
menfafion. Process models are offen complex, wifh many paramefers, buf can somefimes 
be fiffed using sfafisfical principles (see e.g. Brown and Rofhery, 1993, Chapter 10). 

Sfafisfical models usually fall under fhe category of empirical models, which use fhe 
principle of correlafion fo consfrucf a simple model fo describe an observed response in 
terms of one or more explanatory variables. Empirical models use fhe correlafion befween 
fhe explanafory (inpuf) variable(s) and fhe measured response (oufpuf) variable fo build 
a model wifhouf explicif reference fo fhe frue underlying process (alfhough knowledge of 
fhis process may be used bofh fo selecf suifable inpuf variables and fo idenfify fhe appro- 
priafe form of fhe relafionship). This can be useful fo idenfify variables fhaf are influenfial 
where no defatted knowledge of fhe process exisfs, alfhough some care should be faken 
wifh inferprefafion as fhere may be no direcf causafive relafionship befween fhe inpuf 
and oufpuf variables; insfead fhey may bofh be driven by some ofher hidden (lafenf) or 
unmeasured variable. 

We shall consider sfafisfical models of fhe general form 

response = sysfemafic componenf -i- random componenf. 

This model can exisf in absfracf form, buf we usually relate if fo a sef of measuremenfs 
fhaf have been made. The response, or response variable, relates fo one type of numerical 
oufcome from fhe sfudy, somefimes also catted fhe set of observations. The systematic 
component is a mathematical function of one or more explanafory variables fhaf provide 
a represenfafion of fhe experimenfal condifions. The sysfemafic componenf describes fhe 
relafionship befween the response and these explanatory variables and hence between the 
response and the experimental conditions. Where the conditions have a direct numerical 
evaluation, such as count, weight or height, the explanatory variable is termed quantita- 
tive. We refer fo quanfifafive variables as variates. Where the conditions are classified 
info groups or cafegories fhe explanafory variable is termed qualitative. In this case, the 
explanatory variable indicates the group to which each subject belongs. We shall refer fo 
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qualitative variables as factors and identify the distinct groups in the factor as the factor 
levels. For example, sex would be a factor with two levels: male and female. Note that it 
is sometimes convenient to group a quantitative variable into categories so as to treat it 
as a qualitative variable, for example, heights can be classified as short, medium or tall. 
However, this change cannot always be made in reverse; some explanatory variables, such 
as sex, are inherently qualitative. Similarly, if a scientist had compared three types of fertil- 
izer, or one fertilizer across three different plant varieties, then the levels of the explanatory 
variable (fertilizer type or plant variety) cannot be translated into meaningful numbers. In 
the context of experimental studies, the conditions imposed by the experimenter are usu- 
ally represented as factors and referred to as treatments. We also use this term more gen- 
erally to describe the set of conditions present in observational studies when represented 
by factors. In some contexts, where it is more natural, we use the alternative term groups 
instead of treatments. 

In general, the systematic component of the statistical models that we consider can be 
partitioned further into explanatory and structural components as 

systematic component = explanatory component -i- structural component. 

The explanatory component corresponds to the conditions of interest, or treatments, in 
the study. The structural component is used to account for the structure of the study, such 
as sub-sampling within an observational study or blocking within a designed experiment. 
The structural component is not always present: it may be omitted in the (rare) case that 
the experimental material consists of an unstructured sample. This partition facilitates the 
accurate specification of the whole model, as it encourages us to consider the two compo- 
nents separately: the explanatory component relates to our hypothesis (or hypotheses) of 
interest, and the structural component relates to the structure of the experimental material. 

The random component, also known as error or noise, corresponds to variation in the 
response that is not explained by the systematic component. This component may have 
several sources, such as inherent between-subject variability, measurement errors and 
background variation within the environment of the study. Mathematically, we usually 
describe the random component in terms of some appropriate probability distribution (see 
Chapters 2 and 4). 

The systematic component is used to predict the response for any set of experimental 
conditions, and the random component is used to estimate the uncertainty in those predic- 
tions. Here are two simple examples of statistical models. 



EXAMPLE 1.1: QUALITATIVE EXPLANATORY VARIABLE 

Consider an experiment to investigate nutrient feeding strategies for plants grown in 
pots. A scientist has obtained a new liquid feed and wishes to evaluate its effect on 
plant growth. The instructions provided by the manufacturer suggest three feeding 
regimes labelled A, B and C. The scientist decides to grow 12 plants of a single plant 
variety, each one in a separate pot, and to allocate four plants at random to each of the 
three suggested regimes. After six weeks, the height of each plant is measured. Here, 
the response variable is plant height and the only explanatory variable is the feeding 
regime, which is a qualitative variable with three levels. 

We might hypothesize that the plant height for a given feeding regime can be 
expressed symbolically as 

height = overall mean + effect of feeding regime + deviation. 
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This is a simple (empirical) statistical model with height as the response. For an unstruc- 
tured sample of 12 pots, there is no need for a structural component. So here the sys- 
tematic part of the model (i.e. overall mean + effect of feeding regime) relates only to 
the explanatory component, with plant height modelled as an overall mean modified 
by some specific amount for each feeding regime. The random part (labelled deviation) 
allows for the deviation of individual observations from the feeding regime value given 
by the systematic component. Using mathematical notation (see also Section 2.1) we can 
write this model as 



y,! = p -t X, -t . (1.1) 

Here, we have identified each plant by labelling it by the treatment applied (j = 1, 2, 3 
for regimes A, B, C, respectively) and then we number the plants within each treatment 
group (using fc = 1, 2, 3, 4). Hence, y^*. represents the height of the fcth plant with the jth 
feeding regime. We use p to represent the population mean height (the 'overall mean'), 
and Xy represents the difference in response for the jth feeding regime relative to the 
overall mean (the 'effect of the feeding regime'). Finally, is the deviation associated 
with the fcth replicate plant under the ;th feeding regime. 

The symbols p and Xj, Xj, X 3 (usually written as Xy, / = 1 ... 3) are unknown population 
parameters that have to be estimated from the observed sample from the experiment. 

This model represents the height of a plant under the ;th regime using the systematic 
component p + Xy, so a different value pertains to each regime, as shown in Figure 1.1a. 

In Example 1.1, the explanatory variable (feeding regime) is a qualitative variable, or fac- 
tor, with three levels (A, B and C). Without further information we cannot infer relation- 
ships between these factor levels and so we model the response by fitting a separate effect 
for each level. However, if the different feeding regimes correspond to different applica- 
tion rates for the liquid feed, then the scientist could evaluate the quantities corresponding 
to each feed rate and turn them into quantitative values (numbers). We can then consider 
other models for these data as shown in Example 1.2. 

EXAMPLE 1.2: QUANTITATIVE EXPLANATORY VARIABLE 

Suppose now that the scientist in Example 1.1 has evaluated the volumes (or doses) for 
feeding regimes A, B and C as 20, 40 and 60 mL per plant, respectively. The explana- 
tory variable now corresponds to a quantitative variable (i.e. dose) with numeric values, 
and we can reasonably consider the response as a function of this continuous variable, 
expressed symbolically as 



height = f(dose) + deviation, 

where f(dose) indicates some mathematical function of dose. Here, we assume the sim- 
plest case, namely that the function is a straight line relationship (see Figure 1.1b). We 
can formally write this simple model as 



yyt = a -H (3xy -H ejt . 



( 1 . 2 ) 



We again label each plant by the treatment applied (here 7 = 1, 2, 3 for doses 20, 40 and 
60 mL, respectively) and then number plants within each treatment group (using k = l, 
2, 3, 4) so yyj. is the height of the fcth replicate plant with the 7 th dose. Now, Xj is the numer- 
ical quantity of the 7 th dose (Xj = 20, X 2 = 40, X 3 = 60), a is the plant height at zero dose 
(the intercept of the line in Figure 1.1b with the y-axis at x = 0), P is the linear response to 
increasing the dose by 1 mL (the slope of the line in Figure 1.1b), and ey*. is the deviation 
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Feeding regime 



Dose (x) 



FIGURE 1.1 

Two linear models with observed {•) and population ( — ) responses (heights) for the plant growth experiment 
for (a) a qualitative explanatory variable representing three feeding regimes (A, B and C, Example 1.1), and (b) a 
quantitative explanatory variable representing three doses (20, 40 and 60 mL, Example 1.2). 



from the linear trend for the fcth replicate plant with the ;th dose. The symbols a and 
p are unknown population parameters that have to be estimated from the observed 
sample. 

The model represented by Equation 1.2 differs from fhe model represenfed by Equafion 
1.1 in several imporfanf respecfs, even fhough if could arise from fhe same experimenf. 
In Example 1.1, feeding regime was considered fo be a qualifafive variable (and so here 
we call Equafion 1.1 fhe qualifafive model), and a separafe effecf was allowed for each 
level. In Example 1.2, we used addifional informafion, fhaf is fhe numeric values of dose, 
fo fif heighf as a linear funcfion of dose (and so here we call Equafion 1.2 fhe quanfifa- 
five model). The qualifafive model mighf be considered more flexible, as if does nof make 
any assumpfion abouf fhe shape of fhe relafionship. However, fhe quanfifafive model has 
fhe advanfage fhaf if is more parsimonious, i.e. fhaf if uses fewer paramefers fo describe 
fhe paffern. If has fhe furfher advanfage fhaf we can also make predicfions af infermedi- 
afe doses (e.g. 50 mL) using fhe fiffed model (under fhe assumpfion fhaf fhe sfraighf line 
model is appropriafe). 



1.4 Using Linear Models 

Equafions 1.1 and 1.2 are simple examples of linear models, an imporfanf sub-class of fhe sfa- 
fisfical models infroduced in Secfion 1.3. In fhis confexf, fhe response variable is somefimes 
called fhe dependenf variable and fhe explanatory variables are somefimes called indepen- 
denf or predictor variables. The explanatory and sfrucfural componenfs of a linear model 
each consisf of a sef of terms added fogefher (an additive structure) and each term consists 
of eifher a single unknown parameter (such as Xy in Equafion 1.1), or an unknown paramefer 
mulfiplied by a known variable (such as |3xy in Equafion 1.2) - fhis is fhe linear structure. 
The random component, or deviation, is added to the systematic component to give the full 
model. In general, linear models mighf confain terms for several qualifafive or quanfifafive 
explanatory variables or bofh. If is imporfanf, buf slighfly confusing, fo note fhaf fhe oufpuf 
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from a complex linear model will generally not be a straight line (e.g. Equation 1.1), although 
the straight line relationship between a response variable and a single explanatory variable 
(e.g. Equation 1.2) is the simplest example of a linear model. The class of linear models is a 
large and flexible one and, although the models themselves are usually approximations, 
they can adequately represent many real-life situations. The most common uses for linear 
models are model specification, parameter estimation and prediction. 

The main objective in model specification is to determine what form of statistical model 
best describes the relationship between the response and explanatory variable(s). There will 
often be a biological h 5 q)othesis behind a study that suggests a suitable form of model and 
the explanatory variables that should be included in the model. Eor example, in Example 
1.1 the scientist wanted to investigate whether the different feeding regimes had detectable 
effects on plant growth. The process of statistical hypothesis testing can be used to refine the 
model by determining whether there is any evidence in the data that the proposed explana- 
tory variables explain patterns in the response. Often several competing models might be 
compared. If many potential explanatory variables have been measured, variable screen- 
ing may be used to select the variables that best explain the variation in the response. Eor 
example, in field studies on insect abundance, many climatic and environmental variables 
can be measured, and those that are most highly related to insect counts then identified. 

Once an appropriate model has been determined, parameter estimation (see Section 
1.5) is required to interpret the model and, potentially, the underlying biological process. 
Associated with each parameter estimate is a measure of uncertainty, known as the stan- 
dard error. 

The fitted model can be derived by substitution of estimates in place of the unknown 
parameter values in the model, and uncertainty in the fitted model is derived from the 
parameter standard errors. Prediction involves the use of the fitted model to estimate 
functions of the explanatory variable(s) - for example, the prediction of a treatment mean 
together with some measure of its precision. Again, uncertainty in predictions is derived 
from uncertainty in the parameter estimates. 



1.5 Estimating the Parameters of Linear Models 

Any linear model has an associated set of unknown parameters for which we want to 
obtain estimates. Eor example, in fitting the models represented by Equations 1.1 and 1.2 to 
the observed data, our aim is to find the 'best' estimates of the parameters p, Xj, Xj and X 3 , 
or a and (3, respectively. In Chapters 4 (qualitative model) and 12 (quantitative model) we 
present detailed descriptions of how to obtain estimates of these parameters; here, we out- 
line the basic principles. Before we consider the estimation process, some basic notation is 
required. In general, we represent estimated parameter values by placing a 'hat' (^) over the 
parameter symbol, for example, |i indicates an estimate of p, the population mean. Then, 
the fitted value for an observation y^ denoted yjk, consists of the systematic component 
of the model with all parameters replaced by their estimates. So, in the qualitative model 
represented by Equation 1.1, the fitted value for the kth plant with the ;th feeding regime is 



y jk — p + / 



(1.3) 
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which is an estimate of the population mean for planfs wifh fhe/fh feeding regime. For fhe 
quanfifafive model in Equafion 1.2, fhe corresponding fiffed value is 

yjk = a + ^Xj. (1.4) 

For all linear models, paramefers are esfimafed wifh fhe principle of least squares. This 
method finds fhe 'besf-fif' model in fhe sense fhaf if finds esfimafes for fhe paramefers fhaf 
minimize fhe sum, across all observafions, of fhe squares of fhe differences befween fhe 
observed dafa and fiffed values. For example, for fhe qualifafive model (Equafion 1.1) we 
minimize 



3 4 

- Vikf , 

i=l k=l 

where yjk was defined in Equafion 1.3. For fhe quanfifafive model (Equafion 1.2), fhe quan- 
fify minimized has fhe same generic form, buf now Equafion 1.4 is used fo define fhe fiffed 
values. The symbol 2 is used fo indicafe fhe sum across fhe specified index (see Secfion 
2.1 for more defails). Nofe fhaf fhese summafions are over all combinafions of fhe fhree 
facfor levels (; = 1, 2, 3) and fhe four replicafions {k = 1, 2, 3, 4), and hence over fhe full sef of 
12 observafions. Having found fhe besf-fif model for our observed dafa, we can calculafe 
fiffed values based on fhe paramefer esfimafes. We can fhen obfain esfimafes of fhe devia- 
fions, called residuals, from fhe discrepancy befween fhe observed and fiffed values, as 

^jk = yjk~ yjk ■ 

If fhe residuals are relafively small, fhen our model gives a good descripfion of fhe dafa. 
These residuals can be examined fo assess fhe validify of our model (fo diagnose any lack 
of fif of fhe model fo fhe dafa) and fhe assumpfions made in fiffing fhe model fo fhe dafa 
(Chapfers 4 and 12). One such assumpfion concerns an underlying probabilify disfribufion 
for fhe deviafions (see Chapfer 4), and fhe esfimafed variance of fhis disfribufion is used fo 
calculafe fhe paramefer sfandard errors. This variance, offen called fhe residual variance, 
provides a measure of uncerfainfy which can also be used in hypofhesis fesfing and fo 
form confidence infervals for predicfions. 



1.6 Summarizing the Importance of Model Terms 

The main tool we use for the statistical analysis of any linear model, with either qualita- 
tive (factor) or quantitative (variate) explanatory variables, or both, is the analysis of vari- 
ance, usually abbreviated as ANOVA. As the name suggests, the principle behind ANOVA 
is the separation and comparison of different sources of variation. In its simplest form, 
ANOVA quantifies variation in the response associated with the systematic component 
of the model (systematic variation) and compares it with the variation associated with the 
random component of the model (often called noise or background variation). Informally, 
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if the ratio of systematic variation to background variation is large then we can conclude 
that the proposed model accounts for much of the variation in the response, and that the 
explanatory variables provide a good explanation of the observed response. However, if 
the ratio of systematic variation to background variation is small, then it does not neces- 
sarily indicate that the response is not related to the explanatory variables - it may just be 
that the background variation is too large to clearly detect any relationship. We can use 
ANOVA to assess whether the variation associated with different levels, or groups of lev- 
els, of a qualitative explanatory variable (factor) is larger than the background variation, 
which would give evidence that the explanatory variable is associated with substantive 
changes in the response. Similarly, we can assess whether there is substantive variation in 
the response associated with some trend in a quantitative explanatory variable (variate). 
We can often also partition variation associated with different explanatory variables to 
assess their relative importance, and a well-designed experiment can make this easier. We 
use ANOVA to summarize model fitting in two related contexts. 

We first consider the use of ANOVA in structured scientific studies where we include the 
experimental conditions as factors, and wish to relate variation in the response to variation 
in the conditions. For example, consider a traditional field trial to assess the yield response 
of a set of plant varieties to different levels of fertilizer application. Here, the experimen- 
tal conditions are combinations of plant variety and fertilizer application, with both con- 
sidered to be qualitative variables. In a basic analysis, we are interested in identifying 
whether differences between plant varieties or fertilizer application levels, or particular 
combinations of these factors, provide an explanation for the observed differences in yield 
response. Within this context we can then generalize this basic analysis in several different 
ways: to take account of the physical structure of the experimental units (e.g. to allow for 
the blocking of experimental units); to take account of any quantitative scale underlying 
the factor levels (e.g. the nitrogen content of the fertilizer applications); and, in a limited 
way, to account for other explanatory variables that may have been measured (e.g. per- 
haps soil pH varies across the field and affects yield). This is the traditional framework 
for ANOVA and most statistical packages have algorithms tailored to the analysis of data 
within this framework (e.g. the ANOVA command in GenStat, the aov ( ) function in R and 
the proc glm procedure in SAS). 

We then consider the use of ANOVA in scientific studies where the main aim is to model 
the response as a mathematical function of one or more quantitative explanatory vari- 
ables. This context is usually called regression modelling or regression analysis, and 
we emphasize the particular case of linear regression, where only linear functions of one 
or more continuous explanatory variables are permitted. For example, suppose a forester 
wishes to build a model to predict timber volume from easily measured field variables 
such as tree diameter and height. In a basic analysis, having measured both the field vari- 
ables and the actual timber volume for a number of trees, we are interested in determining 
which field variables (or combinations of field variables) explain the observed differences 
in timber volume. Again, within this context we can generalize the basic analysis to take 
account of any grouping of observations, such as tree variety or location. Within regres- 
sion modelling, ANOVA is the main statistical tool used for assessment of the importance 
of different explanatory variables. Statistical software packages usually contain more gen- 
eral algorithms for regression analyses (e.g. FIT in GenStat, the Im ( ) function in R and the 
proc reg procedure in SAS). 

It should be clear that there is much overlap between these two contexts. For example, 
both the qualitative model of Example 1.1 and the quantitative model of Example 1.2 could 
be analysed by either type of algorithm. However, using different algorithms to analyse 
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the same data set can be confusing, because even when the methods are equivalent, the 
results may appear to differ if differenf convenfions are used for fheir presenfafion. One of 
fhe main aims of fhis book is fo explain fhe rafionale behind fhese differenf convenfions, 
and so fo eliminafe fhis confusion. 



1.7 The Scope of This Book 

We follow fhis chapfer wifh a review chapfer. Alfhough we minimize fhe use of mafh- 
emafical formulae, some are essenfial, and so we provide a review of mafhemafical nofa- 
fion in Chapfer 2, along wifh fhe basic sfafisfical concepfs and mefhods used elsewhere in 
fhe book. Many readers will be familiar wifh fhese concepfs and mighf freaf fhis chapfer 
as opfional. 

The early chapfers of fhe book (Chapfers 3 fo 11) focus on fhe design of experimenfs and 
fhe analysis of dafa from designed experimenfs. In Chapfer 3, we concenfrafe on fhe essen- 
fial sfafisfical principles of design: replicafion, randomizafion and blocking. We consider 
fhe sfrucfure of an experimenf and describe some common designs. In Chapfers 4 fo 7, we 
consider analysis of simple designs. In Chapfer 4, we consider in defail fhe analysis of dafa 
from fhe simplesf design - fhe complefely randomized design - fo explain fhe concepfs of 
ANOVA. We explain how fhe ANOVA fable is formed, how if relafes fo a model for fhe dafa 
and how fo inferpref if. In Chapfer 5, we explore fhe assumpfions underlying fhe model 
and analysis and describe fhe diagnosfic fools we can use fo check fhem. We consider how 
fhese assumpfions mighf be violafed and fhe possible consequences, and ways fo remedy 
fhese problems. In Chapfer 6, we discuss fransformafions of fhe response variable as one 
remedy for failure fo safisfy fhe model assumpfions. In Chapfer 7, we exfend fhe analy- 
sis fo fhe simplesf design fhaf includes blocking, fhe randomized complefe block design, 
and infroduce fhe concepf of sfrafa, or differenf sfrucfural levels, wifhin a design and ifs 
analysis. In Chapfers 8 fo 11, we consider more advanced issues in fhe analysis of designed 
experimenfs. In Chapfer 8, we consider how besf fo exfracf answers abouf our experimen- 
fal hypofheses from our analysis. The advanfages of factorial freafmenf sfrucfures, used fo 
fesf fhe effecfs of several freafmenf factors simulfaneously, will be explained. We describe 
fhe use of crossed and nesfed models for factorial sfrucfures, and how fo make pairwise 
comparisons of freafmenfs. In Chapfer 9, we describe fhe analysis of some designs wifh 
somewhaf more complex blocking sfrucfures, namely fhe Lafin square, splif-plof and bal- 
anced incomplete block designs. Then in Chapfer 10, we consider how fo calculate fhe 
replicafion required fo obfain a specified precision for freafmenf comparisons in simple 
designs, and we infroduce fhe concepf of sfafisfical power. We also discuss fhe case of 
equivalence fesfing, where fhe inferesf is in defecfing equivalence rafher fhan differences 
befween freafmenfs. Finally, Chapfer 11 examines fhe issues fhaf arise for non-orfhogonal 
designs, where an unambiguous analysis can no longer be obfained. 

In fhe lafer chapfers of fhe book (Chapfers 12 fo 18) we furn our affenfion fo regression 
modelling. In Chapfer 12, after a brief general infroducfion, we concenfrafe firsf on simple 
linear regression, relafing fhe response fo a linear funcfion of a single explanatory variafe. 
The diagnosfic fools infroduced in Chapfer 5 can be used for regression modelling, buf 
addifional diagnosfic fools are available fo check fhe validify of a regression analysis, and 
fhese are infroduced in Chapfer 13. In Chapfers 14 and 15, we fhen exfend regression mod- 
els. In Chapfer 14, we infroduce mulfiple linear regression, extending fhe simple linear 
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regression model to include several explanatory variates and considering problems of col- 
linearity and variable selection. In Chapter 15, we show how to investigate the best form 
of a regression model when observafions arise from differenf groups, how fo incorporafe 
simple designs info regression models and discuss analysis of covariance. We fhen move 
beyond linear regression. In Chapfer 16, we infroduce linear mixed models for fhe analysis 
of unbalanced sfudies where sfrucfure is presenf. In Chapfer 17, we firsf use funcfions of 
explanafory variables fo model curved relafionships wifh linear models and fhen give a 
brief infroducfion fo non-linear models. This concepf is exfended in Chapfer 18 fo fhe case 
of fhe generalized linear model, which can be used fo model responses wifh cerfain fypes 
of non-Normal errors. We infroduce fwo special, buf commonly used, cases - fhe logif 
model for Binomial (proporfion) dafa, and fhe log-linear model for Poisson (counf) dafa. 

Finally, fhe concluding chapfer (Chapfer 19) provides an overview of fhe full process of 
design and sfafisfical analysis by way of real examples. 

Our websife (www.sfafs4biol.info) provides an overview and basic infroducfion fo fhree 
commonly used sfafisfical packages: GenSfaf, R and SAS. All of fhe examples are analysed 
wifh each of fhese packages, fogefher wifh answers fo selecfed exercises. Our personal 
preference is for fhe GenSfaf sfafisfical soffware, because of ifs excellenf implemenfafion of 
algorifhms for fhe analysis of designed experimenfs, and fhe provision of menus fo make 
analyses easily accessible fo all. The R package provides funcfions for all fhe sfandard 
analysis approaches infroduced in fhis book, and has fhe benefifs and drawbacks associ- 
afed wifh being free, open-source soffware. We include SAS because of ifs wide user base 
and general availabilify. Mosf resulfs presenfed in fhe book can be obfained wifh any of 
fhese packages; we commenf where resulfs may differ befween packages and oufpuf has 
been obfained from a specific package. 



2 

A Review of Basic Statistics 



This chapter briefly reviews some basic mathematical and statistical concepts that are fun- 
damental to the material that comes later. Readers familiar wifh fhe mafhemafical nofafion 
commonly used fo define summary sfafisfics and wifh simple sfafisfical fesfs, such as fhe 
f-fesf, can freaf fhis chapfer as opfional revision or as reference maferial. We flrsf infro- 
duce fwo commonly used sfafisfics, fhe sample mean and sample variance, and in doing 
so define fhe mafhemafical nofafion we use fhroughouf fhe resf of fhe book (Secfion 2.1). 
We fhen review random variables and probabilify disfribufions wifh parficular reference 
fo fhe Binomial disfribufion for discrefe variables and fhe Normal disfribufion for con- 
finuous variables (Secfion 2.2). Lafer, we discuss sfafisfical inference (Secfion 2.3), review 
one- and fwo-sample f-fesfs (Secfion 2.4), and discuss fhe concepf of correlafion (and cova- 
riance) befween fwo variables (Secfion 2.5). To complefe fhis chapfer, we describe our con- 
venfions for presenfafion of calculafions and numerical resulfs (Secfion 2.6). 



2.1 Summary Statistics and Notation for Sample Data 

As discussed in Section 1.2, when we obtain data from a study we regard them as a sample 
from the broader set of results that we might obtain if we repeated the experiment many 
times, and use statistical techniques to make inferences from our sample to this wider 
population. The first step in any analysis is to summarize the data. In defining the tools 
we use to do this, and later to analyse data, we often express the mathematical or statisti- 
cal concepts algebraically using some standard mathematical notation. Symbols with pre- 
defined meanings, often Greek letters (e.g. p and a), and various shorthand expressions 
are commonly used. For example, in a laboratory experiment where several treatments 
are to be compared, we can use the letter N to represent the total number of observations 
made (e.g. N = 20), the letter t to represent the number of treatments or groups (e.g. t = 2), 
and, if they are equally replicated, the letter n to represent the number of replicates of each 
experimental treatment (e.g. n = 10). If the treatments are unequally replicated then the 
notation is extended by the use of subscripts to denote the replication for any given treat- 
ment, for example, = 12 and «2 = 8 indicates that treatment 1 has 12 replicates and treat- 
ment 2 has eight. An individual response (datum or measurement) is often represented 
by a lower case italic letter (usually y) with an index (usually a subscript) to identify it 
uniquely. For example, y, might be used to denote the response from the ith observation. 
To specify a set of N responses we write y„ i = 1 ... N (where 1 . . . N denotes all integers 
from 1 to N). Such notation is useful as it allows us to write down general expressions or 
formulae applicable to any statistical analysis. For a particular data set, we then substitute 
the actual numerical values recorded in place of the symbols. Note that whilst there are 
some generally accepted conventions, notation often differs between books and subject 
areas. In this section we define notation to be used throughout this book. Occasionally, 
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we have used the same symbol to represent different quantities in different contexts. We 
have tried to minimize this practice, as it is potentially confusing, and fry fo explain all of 
our nofafion as if is infroduced. 

Many sfafisfical formulae are written as sums of several componenfs. The Greek lef- 
fer Z (capifal sigma) is commonly used fo denofe fhe sum of a sef of values defined by 
fheir index numbers, over a range wifh fhe lower limif specified by a subscripf and fhe 
upper limif specified by a superscript so X or specifies a sum over fhe index num- 

p=l 

hers from 1 up fo N. For example, fhe sum (or fofal) of a sef of N responses, labelled as y,, 
i = l ... N, would be written as 



N 

yi + yz + ... + yw-i + Vn = ^y,- • 

i=l 

For brevify we somefimes wrife Z, fo indicafe summafion over all available values of 
index i. 

An imporfanf summary sfafisfic, fhe sample mean (or sample grand mean), defined as 
fhe arifhmefic mean of fhe N dafa values and denofed y , would fhen be written 

(=1 

The use of fhe summafion symbol Z has simplified and generalized fhe expression for 
fhe sample mean, which is fhe sum of all responses from label 1 fo label N, divided by fhe 
number of observafions, N. 

Somefimes if is useful fo label observafions wifhin freafmenf groups using fwo (or more) 
subscripfs. So in an experimenf wifh t = 2 freafmenfs and n = 10 replicafes of each freaf- 
menf, the resulting set of responses mighf be concisely represenfed as yj^, j = 1,2, k = l ... 
10, where fhe index j indicafes fhe freafmenf applied and fhe index k labels fhe replicafes 
wifhin freafmenfs. Formulae may fhen be simplified by using 'double sums', for example, 
fhe expression 



2 10 

XXi/A 

j=l k=l 

represents summation over 20 responses. The two indices are summed over in turn, the 
'inner' (or rightmost) sum being executed first (here for values of index k from 1 to 10), to give 

2 10 2 

'^'^ yi '^ = ^(l/;l + 3/;2 + ••• + 3//10) • 

;=l k=l j=\ 



Then fhe 'oufer' (or lef fmosf) sum is execufed (here across values of index j from 1 fo 2) 
fo give a sum across all combinafions of fhe fwo indices and hence fhe full sef of obser- 
vafions. We adapf and exfend fhese basic forms of nofafion as necessary fhroughouf 
fhis book. 
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When data are identified by more than one index, it is common to express totals and 
means using the 'dot notation'. Suppose that for t freafmenf groups, yji^ idenfifies fhe kth 
replicafe response belonging fo fhe ;fh freafmenf group. Then i/y. and yy. represenf fhe 
group fofal and group mean response, respecfively, for observafions on fhe ;fh freafmenf. 
If My is used fo represenf fhe number of observafions in fhaf jth group, fhen fhese expres- 
sions can be wriffen algebraically as 



Vi- 




Vi- = 



-X 



Vjk 



Here, fhe dof symbol in fhe posifion of index k indicafes summafion across fhe observa- 
fions for all possible values of fhaf index, i.e. for fc = 1 ... Uy, wifh fhe ofher index (or, in gen- 
eral, indices) kepf consfanf. The bar symbol over fhe y indicafes fhaf fhe mean is faken 
by dividing fhe fofal by fhe number of observafions included in fhe summafion. Nofe fhaf 
wifhin this system of nofation fhe overall sample mean should sfricfly be denofed y.. buf, for 
simplicify, fhe dofs are generally omitted here; fherefore we use y fo represenf fhe sample 
grand mean. The overall and group means are convenfionally used as summaries of fhe loca- 
tion (average response) for fhe whole sample and particular freafmenf groups, respecfively, 
and as esfimafes for fhe corresponding values in fhe population from which fhe sample has 
been taken. 

The sample variance quantifies fhe amounf of variafion, or spread, in fhe sample abouf 
ifs mean. For responses y„ i=l ...N, fhe sample variance can be expressed algebraically as 

(2-2) 



In Equation 2.2, fhe N deviafions of fhe individual responses abouf fhe overall sample 
mean are squared and fhen added fogefher, and fhe resulf is divided by fhe number of 
observafions, N. The process of subfracfing fhe sample mean from all of fhe responses is 
known as centering, so the variance is proportional to the sum of fhe squared cenfered 
responses. We mighf fhink of using fhis quanfify fo esfimafe fhe variance of fhe popu- 
lafion from which fhe sample has been faken, buf fhaf esfimafor is biased and fends fo 
underesfimafe fhe populafion variance. We fherefore use a scaled version of fhe sample 
variance fo esfimafe fhe populafion variance, known as fhe unbiased sample variance 
and here denofed s^, wriffen as 



P.3) 

These sample variance sfafisfics are nof on fhe same measuremenf scale as fhe original 
responses. However, fheir square roofs do have fhe same unifs as fhe response, and we 
usually choose fo work wifh fhe unbiased sample standard deviation, s. The coefficient 
of variation (%CV) expresses the unbiased sample standard deviation as a percentage of 
fhe sample mean, calculafed as 



%CV = 100 X - . 

y 
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This quantity is sometimes used as a measure of the relative precision of an experimenf, 
parficularly in fhe confexf of field experimenfs. However, %CV provides a sensible sum- 
mary sfafisfic only for variables measured relafive fo an absolufe (as opposed fo arbifrary) 
zero fhaf forms a lower limif for observed values. 



EXAMPLE 2.1A: WHEAT YIELDS* 

A total of N = 7 measurements of the yield of a commercial variety of wheat were 
obtained from a field trial. The samples were converted into equivalent yields per hect- 
are as: 7, 9, 6, 12, 4, 6 and 9 t/ha. The sample mean is calculated (as in Equation 2.1) as 



7 + 9 + 6 + 12 + 4 + 6 + 9 



53 

7 



7.5714 , 



or 7.57 when rounded to two decimal places. Using Equation 2.3, we calculate the unbi- 
ased sample variance as 



2 _ (7 - 7.57)^ + (9 - 7.57)^ + . . . + (9 - 7.57)^ _ 41.71 
(7-1) ■ " 



6.95 . 



Note that although we have shown values rounded to two decimal places within this 
calculation, we actually perform this (and all following) calculations using full accu- 
racy, as explained in Section 2.6. The unbiased sample standard deviation is then calcu- 
lated directly as s = V6.95 = 2.64 t/ha, with %CV = 100 x 2.64/7.57 = 34.82. This is larger 
than is usual for a well-managed agricultural trial, but is based on a small number of 
values and so may be poorly estimated. 

To make statistical inferences on samples, we usually make some assumptions abouf fhe 
probabilify distribufion underlying fhe dafa, and we infroduce fhis concepf in fhe nexf secfion. 



2.2 Statistical Distributions for Populations 

Before discussing probabilify disfribufions, we wanf you fo undersfand fhe concepf of 
a random variable. A random variable represenfs fhe possible oufcomes of a sfochasfic 
process, i.e. a process fhaf is nof deferminisfic, buf includes some unpredicfable variation. 
Convenfionally, random variables are represenfed by upper case symbols, wifh realiza- 
tions of fhe variable represenfed by lower case symbols. For example, we mighf denofe 
yield from a field plof as a random variable Y, wifh fhe realized yield denofed y. When 
defining models in lafer chapfers, we will offen nof make fhis disfincfion, and simply use 
fhe lower case symbols. 

We use probabilify disfribufions fo help us make inferences from dafa. If we can realisfi- 
cally assume fhaf fhe populafion of possible oufcomes from an experimenf behave like a 
sample of a random variable from a cerfain probabilify disfribufion, fhen we can use known 
properfies of fhaf disfribufion fo derive inferences for fhe populafion from our observafions. 

The mafhemafical fheory underlying probabilify disfribufions requires a disfincfion fo 
be made befween discrefe and confinuous random variables. A discrete random vari- 
able is one that can take only a certain pre-specified set of possible values, such as infeger 
counfs. A continuous random variable may take any real value within its defined range. 
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In this book, we are primarily concerned with continuous random variables that follow a 
Normal distribution, except in Chapter 18 where we consider discrete random variables 
following eifher Binomial or Poisson disfribufions. However, as if is easier fo undersfand 
fhe concepfs associafed wifh probabilify disfribufion funcfions by working wifh discrefe 
disfribufions, we sfarf wifh definifions for fhis type of random variable, using fhe Binomial 
disfribufion as an illusfrafion. 

2.2.1 Discrete Data 

As stated above, a discrete random variable Y can take only a certain pre-specified set of 
possible values, which we denote by S. The probability distribution associated with Y is a 
function that gives the probability of observing a particular value, y, called a point prob- 
ability, denoted as Py(i/) = Prob(Y = y) ('the probability that variable Y takes value y'). It is 
also often useful to consider the cumulative distribution function, denoted Fy, defined as 
the probability that the random variable is less than or equal to a certain value y, written as 

Fy(y) = Prob(Y < y) = ^Py(i7) , 

v<y 



i.e. the sum of point probabilities over all values v in the set S that are less than or equal 
to the target value y. This function takes values between zero (for values of y less than 
the minimum value of Y) and one (for values of y equal to or greater than the maximum 
value of Y). 

The Binomial distribution is an example of a discrete probability distribution. It is usu- 
ally derived as the distribution of the number of successes out of a series of m independent 
binary trials (i.e. trials with only two possible outcomes: success or failure), where each 
trial has an equal probability of success, denoted p. For example, the number of heads (suc- 
cesses) obtained after the tossing of two separate coins {m = 2) can take the values y = 0, 1 
or 2, and follows a Binomial distribution with success probability p = 0.5 (for fair coins). 
The Binomial probability distribution takes the form 

The calculated probability of y successes is then a function of the number of successes, y, 
the number of tests, m, which is known, and the probability of success p, which is often 
unknown. Note that p“ = (1 - p)° = 1, and the factorial function x\ is defined for any positive 
integer x as the product of all integer values less than or equal to x, i.e. 

x! = X X (x - 1) X (x - 2) X . . . X 1 , 

so that 1! = 1, 2! = 2, 3! = 6 and so on. A value is also needed for x = 0, and by convention 0! 
is defined to be equal to 1. 

EXAMPLE 2.2A: PLANT INEECTION* 

Consider an experiment in which three plants in a pot are inoculated with a virus, where 
each plant has a 40% chance of becoming infected. The number of plants (0-3) that show 
symptoms several days after inoculation can be considered to follow a Binomial distri- 
bution. The possible values that can be observed (the set S) are the integers 0, 1, 2, 3. The 
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number of tests here is the number of plants, so ni = 3, and the probability of success is 
the probability of infection, sop = 0.4. We can calculate the probability of each outcome 
using Equation 2.4 as follows; 

Prob(Y = 0; m = 3, p = 0.4) = 1 x 0.6^ = 0.216 

Prob(Y = 1; m = 3, p = 0.4) = 3 x 0.4 x 0.6^ = 0.432 

Prob(Y = 2; m = 3, p = 0.4) = 3 x 0.4^ x 0.6 = 0.288 

Prob(Y = 3; m = 3, p = 0.4) = 1 x 0.4^ = 0.064 . 

The cumulative distribution can be derived directly as 

Prob(Y < 0; m = 3, p = 0.4) = Prob(Y = 0) = 0.216 

Prob(Y < 1; ra = 3, p = 0.4) = Prob(Y < 0) + Prob(Y = 1) = 0.216 + 0.432 = 0.648 

Prob(Y < 2; m = 3, p = 0.4) = Prob(Y < 1) + Prob(Y = 2) = 0.648 + 0.288 = 0.936 

Prob(Y < 3; m = 3, p = 0.4) = Prob(Y < 2) + Prob(Y = 3) = 0.936 + 0.064 = 1.000 . 

This cumulative distribution function is shown in Figure 2.1a. It is a discontinuous 
function, defined on the range 0-3, with jumps at the values in S. The function values 
are shown using solid lines and filled circles; the open circles and dashed lines are used 
to join the discontinuous segments. 

The inverse of the cumulative distribution function is known as the quantile func- 
tion. The quantiles of a distribution divide its range into intervals such that each inter- 
val contains an equal proportion of the distribution. Special cases include the median 
(which divides the distribution into two parts) and the quartiles (four parts). The inter- 
quartile range (first to third quartile, or central part of the distribution) gives a measure 
of the spread of a distribution. Percentiles are often used and divide the distribution into 
100 parts. Hence, the median and the first and third quartiles can alternatively be termed 
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FIGURE 2.1 

(a) Cumulative distribution function for plant infection data (Example 2.2A) and (b) with 0.5 quantile marked 
(Example 2.2B). 
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the 50th, 25th and 75th percentiles, respectively. Quantiles for common distributions are 
widely available in statistical software and in books of sfafisfical fables. Formally, fhe q 
quanfile (for 0 < < 1) can be defined as fhe value v safisfying 

minlu e S : Fy(u) > q] . 

The symbol g means 'is an elemenf of', so fhe q quanfile is fhe smallesf value in fhe sef S 
fhaf has cumulafive disfribufion funcfion value greafer fhan or equal fo q. 

EXAMPLE 2.2B: PLANT INEECTION* 

We can find any quantile for the distribution underlying the plant infection experiment 
from the cumulative distribution function shown in Figure 2.1a. For example, suppose 
we wish to find the median, i.e. quantile q = 0.5. In Figure 2.1b, we draw a horizontal line 
at height 0.5, and find that the smallest valid value (i.e. in the set 0, 1, 2, 3) with cumulative 
probability greater than this value is 1; hence, 1 is the median value for this distribution. 

The mean, or expected value, of a discrefe random variable Y is calculafed as 

E(Y) = yPriy) . (2.5) 

yeS 

This equafion is inferprefed as 'fhe sum, over all possible values of Y (i.e. for y g S), of fhe 
values mulfiplied by fheir poinf probabilifies'. This is a measure of fhe locafion (average or 
mean value) of fhe disfribufion. Similarly, fhe spread of fhe disfribufion is measured by ifs 
variance, which can be expressed as 

Var(Y) = 5^[y-E(Y)fPy(y). (2.6) 



This expression (Equafion 2.6), wrifes fhe variance as fhe sum, over all fhe possible values 
of Y, of fhe squared deviafion of each value from fhe mean, mulfiplied by ifs poinf prob- 
abilify. We can inferpref fhese quanfifies as fhe mean and variance of a populafion fhaf 
follows fhe given probabilify disfribufion. Unsurprisingly, fhe expression for fhe variance 
of fhe random variable in Equafion 2.6 has a similar sfrucfure fo fhaf for fhe variance of a 
sample (Equafion 2.2) and we explore fhis connecfion furfher below. 

The expecfed value (mean) of fhe Binomial disfribufion fakes fhe form 



E(Y) = ^yPy(y;m,p) 

y=0 



Si' 



y=0 



ml 

y\{m-y)\ 



f{l - pf-^ 



= mp , 



with variance 



Var(Y) = ^(y - mpf 

y=0 



ml 

,y'-(m-y)l 



p^l-pr^ 



mp{l - p) . 



Obfaining fhe simplified forms requires mafhemafical manipulafions oufside fhe scope of 
fhis book (see for example Wackerly ef al., 2007). Nofe fhaf bofh fhe mean and fhe variance 
are funcfions of the population parameters m and p. 
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EXAMPLE 2.2C: PLANT INEECTION* 

We can now use the probability distribution obtained in Example 2.2A to calculate the 
mean and variance of the distribution of the number of infected plants. The distribution 
mean is calculated as 

E(Y) = (0 X 0.216) + (1 X 0.432) + (2 x 0.288) + (3 x 0.064) = 1.2, 

and we can verify directly that E(Y) = 1.2 = 3 x 0.4 = mp. Similarly, we can calculate the 
distribution variance as 

Var(y) = [(0 - 1.2)2 ^ q. 216] + [(1 - 1.2)^ x 0.432] + [(2 - 1.2)^ x 0.288] 

+ ](3 - 1.2)2 X 0.064] 

= 0.3110 + 0.0173 + 0.1843 + 0.2074 
= 0.72, 

and again we can verify directly that Var(Y) = 0.72 = 3 x 0.4 x 0.6 = mp(l - p). 

In practice, the true probability distribution of any sample is usually unknown. If a dafa 
sef can be considered as a sef of samples from fhe same underlying disfribufion, fhen fhe 
empirical probability distribution of fhe sample, i.e. fhe relafive frequency of each value 
wifhin fhe sample, gives informafion on fhe form of fhaf underlying probabilify disfribu- 
fion. The relafive frequency is defined as fhe frequency of each value as a proporfion of 
fhe fofal number of values and gives an esfimafe of each poinf probabilify, and can be 
graphically represenfed by using a bar charf. The sample mean (Equafion 2.1) can fhen be 
calculafed from fhe empirical probabilify disfribufion using fhe formula for fhe expecfed 
value (Equafion 2.5) affer subsfifufion of fhe relafive frequencies for fhe unknown poinf 
probabilifies. The sample variance (Equafion 2.2) can similarly be calculafed using fhe 
formula for fhe variance (Equafion 2.6). In fhis sense, fhe sample mean and variance can 
be seen as esfimafes of fhe frue mean and variance of fhe underlying random variable, 
alfhough in pracfice we usually use fhe unbiased sample variance (Equafion 2.3) fo gef an 
unbiased esfimafe of fhe variance of fhe random variable. 

There are various differenf definifions of fhe sample quanfiles, and we use one of fhe 
simpler (buf common) definifions. Eor a sample t/j . . . t/^/ sample percenfile is found 

in several sfeps. Eirsf, fhe sample is puf info order of increasing size of ifs values. Then fhe 
index number, j, wifhin fhe ordered sef is calculafed as 

y = (N+l)xfc/100. 

If j is an infeger value fhen fhe kth sample percenfile is fhe ;fh value in fhe ordered sef. 
If j is nof an infeger value, fhen lef I denofe fhe largesf infeger smaller fhan j (i.e. fhe nexf 
smallesf infeger). The kth sample percenfile is fhen defined as fhe average of fhe Ith and 
(/ + l)fh values in fhe ordered sef. 

EXAMPLE 2.2D: PLANT INEECTION* 

Suppose that our plant infection experiment is now carried out with 20 pots, each con- 
taining three plants, with the following numbers of plants per pot becoming infected: 

0, 2, 1, 1, 0, 1, 1, 1, 2, 1, 3, 1, 3, 1, 0, 0, 0, 2, 1 and 1, i.e. no plants infected in five pots, one 
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plant infected in 10 pots, two plants infected in three pots, and three plants infected in 
two pots. The empirical probability distribution is thus 

Prob(Y = 0) = 5/20 = 0.25 

Prob(Y=l) = 10/20 = 0.50 

Prob(Y = 2) = 3/20 = 0.15 

Prob(Y = 3) = 2/20 = 0.10. 

This empirical distribution is shown as a bar chart in Figure 2.2. The sample mean can 
be calculated either directly from the data values (i.e. as 22/20 = 1.1), or via the empirical 
probability distribution as 

E(Y) = (0 X 0.25) + (1 X 0.50) + (2 x 0.15) + (3 x 0.10) = 1.1 , 

which is a slight underestimate of the true population mean (obtained as 1.2 in 
Example 2.2C). Similarly, the sample variance can be either calculated directly, or via 
the empirical distribution as 

Var(Y) = |[(0 - 1.1)^ x 0.25] + [(1 - 1.1)^ x 0.50] + [(2 - l.l)^ x 0.15] 

+ [(3 - Ilf X 0.10]) 

= 0.3025 + 0.0050 + 0.1215 + 0.3610 
= 0.79 . 

We can convert this into the unbiased sample variance by multiplying by N/(N - 1), giv- 
ing 0.79 X 20/19 = 0.83 as an estimate of the variance of the underlying random variable. 
Eor this sample, this estimate is larger than the true value of the population variance 
(obtained as 0.72 in Example 2.2C). 

To calculate sample quantiles, we list the observations in order as 0, 0, 0, 0, 0, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3. The sample median {k = 50, so fc/100 = 1/2) requires 
7 = 21 X 1/2 = 10.5. The 10th and 11th values in the ordered set are both 1 and hence 




Number of infected plants 



FIGURE 2.2 

Bar chart showing the empirical probability distribution of the number of infected plants in the plant infection 
trial. Three plants were tested in each of 20 pots (Example 2.2D). 
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the sample median is 1. The sample lower quartile (k = 25, A:/100 = 1/4) requires 
j = 21 X 1/4 = 5.25, with estimate 0.5 (the average of the 5th value, 0, and the 6th value, 
1, from the ordered set) and the sample upper quartile (k = 75, A:/100 = 3/4) requires 
/ = 21 X 3/4 = 15.75, with estimate 1.5 (the average of the 15th value, 1, and the 16th 
value, 2, from the ordered set). 

These sample statistics deviate from those associated with the theoretical distribu- 
tion calculated in Examples 2.2B and C because of variations inherent in the sampling 
process - if the experiment was repeated, then somewhat different results would be 
obtained each time. 



2.2.2 Continuous Data 

A continuous random variable can take any real value within a defined range. For exam- 
ple, plant heights can take any value greater than zero. Theoretically, therefore, there are 
infinitely many possible values, each with negligible probability (because there are so 
many possibilities), and the formulae for discrefe random variables have fo be adapfed fo 
fake fhis info accounf. In fhis confexf, we refer fo densify funcfions rather than distribution 
functions and we work with integrals rather than sums. 

It is helpful in fhis case fo sfarf wifh fhe cumulative density function (CDF), defined 
as in the discrete case as Fy(i/) = Prob(Y < y), which again takes values between zero (for 
values af or below fhe minimum value of Y) and one (for values af or above the maximum 
value of Y). The probability density function (PDF), fy(y), can be interpreted as the prob- 
ability of Y falling in fhe range y < Y < y + 5, divided by 5, as 5 decreases fo zero, which is 
fhe derivafive of fhe CDF. In mafhemafical ferms, fhe CDF is wriffen in ferms of fhe PDF as 



v=y 

Fy(y) = J fy(z;) di; . 

17=— oo 

Informally, fhis can be inferprefed as meaning thaf fhe infegral (j) sums the probabilities 
fy(z;) across all the possible values of v befween fhe lower limif (here, minus infinify) and 
fhe upper limif, fhe fargef value y. As such, this is directly analogous to the formula in 
fhe discrefe case. The quanfile funcfion is now defined sfraighfforwardly in ferms of fhe 
inverse CDF with quantile q {0<q<l) defined as the value v such that 

V = Fy^q) . 

The formulae for fhe expecfed value and variance of fhe random variable are also analo- 
gous fo fhose in the discrete case, but using integrals in place of summafions. Flence, fhe 
expecfed value of fhe random variable is wriffen as 

y=“ 

= \ y fr(y)dy , 

y=-“ 

and the variance of the random variable is written as 

y=oo 

Var(Y) = J (y - E(Y))^fy(y)dy . 
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A histogram, the continuous analogue of the bar chart, can be used to give information on 
the shape of fhe empirical PDF. For confinuous variables, dafa values have fo be grouped 
info configuous infervals. In fhe simple case where all infervals have equal widfh, fhen fhe 
relafive frequency of observafions in each inferval is ploffed. If infervals are of unequal 
widfhs fhen, for each inferval, fhe relafive frequency is divided by fhe inferval widfh, so 
fhaf fhe area under fhe histogram in each inferval is equal fo ifs relafive frequency. As in fhe 
discrete case, fhe sample mean and variance can be considered as esfimafes of fhe expected 
value and variance of fhe random variable, alfhough again we usually use fhe unbiased 
sample variance fo gef an unbiased esfimafe of fhe variance of fhe random variable. 

EXAMPLE 2.3A: WILLOW BEETLE MEASUREMENTS 

A sample of 50 willow beetles (Phratora vulgatissima) was taken from a willow crop 
located close to Bristol, UK, and various characteristics were measured, including the 
total length and width of each beetle (Peacock et al., 2003). The data are presented in 
Table 2.1 and can be found in file willow.dat. 

The length measurements ranged from 4.10 to 4.95 mm. The sample mean for length 
was 4.552 mm with unbiased sample variance 0.0260 mm^ and standard deviation 
0.1611 mm. The empirical probability distribution is illustrated using a histogram in 
Figure 2.3. The histogram uses 10 intervals of length 0.1 mm, starting at 4 mm length. 

The relative frequencies plotted are calculated as the number of observations in each 
interval divided by N = 50. In this case, the sample median is the average of the 25th 
and 26th values in the ordered set of observations (smallest first), which is 4.55 mm. The 
index number for the lower quartile is j = 12.75, so the lower quartile is the average of 
the 12th and 13th values in the ordered set, here 4.45 mm, slightly smaller than the mean. 
Similarly, the index number for the upper quartile is; = 38.25, so the upper quartile is the 
average of the 38th and 39th values in the ordered set, here 4.65 mm. The inter-quartile 
range is thus 0.20 mm, which is slightly larger than the standard deviation. 

TABLE 2.1 



Length and Width (mm) of 50 Willow Beetles {Phratora vulgatissima) Sampled from a Willow Crop 
Located Close to Bristol, UK (Example 2.3A and File willow.dat) 



Beetle 


Length 


Width 


Beetle 


Length 


Width 


Beetle 


Length 


Width 


1 


4.60 


1.50 


18 


4.55 


1.50 


35 


4.60 


1.60 


2 


4.70 


1.65 


19 


4.60 


1.70 


36 


4.55 


1.65 


3 


4.50 


1.55 


20 


4.55 


1.60 


37 


4.775 


1.55 


4 


4.55 


1.65 


21 


4.35 


1.60 


38 


4.60 


1.60 


5 


4.75 


1.65 


22 


4.45 


1.60 


39 


4.45 


1.475 


6 


4.40 


1.50 


23 


4.55 


1.55 


40 


4.60 


1.60 


7 


4.20 


1.70 


24 


4.35 


1.55 


41 


4.65 


1.625 


8 


4.70 


1.55 


25 


4.65 


1.65 


42 


4.725 


1.65 


9 


4.55 


1.60 


26 


4.50 


1.55 


43 


4.95 


1.725 


10 


4.70 


1.65 


27 


4.45 


1.50 


44 


4.65 


1.65 


11 


4.65 


1.55 


28 


4.45 


1.60 


45 


4.60 


1.625 


12 


4.50 


1.55 


29 


4.10 


1.40 


46 


4.45 


1.55 


13 


4.30 


1.50 


30 


4.50 


1.50 


47 


4.30 


1.525 


14 


4.65 


1.65 


31 


4.60 


1.60 


48 


4.75 


1.60 


15 


4.75 


1.65 


32 


4.75 


1.575 


49 


4.525 


1.55 


16 


4.65 


1.60 


33 


4.70 


1.65 


50 


4.35 


1.50 


17 


4.45 


1.60 


34 


4.35 


1.55 









Source: Data from Rothamsted Research (A. Karp). 
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Beetle length 

FIGURE 2.3 

Histogram of relative frequencies for lengths (mm) of willow beetles from a sample of size 50 (Example 2.3A). 



2.2.3 The Normal Distribution 

In this book we assume in most instances that random variables follow a Normal distribu- 
tion (sometimes called the Gaussian distribution), which approximately describes many 
types of confinuous measuremenfs, such as lengfhs, weighfs and so forfh. The PDF for fhe 
Normal disfribufion is a bell-shaped symmefric curve, faking ifs maximum value af fhe 
mean (Figure 2.4a). As for all symmetric distributions, the median of fhis disfribufion is 
equal fo ifs mean. The Normal disfribufion is defined by fwo paramefers, fhe mean, p, and 
fhe variance, a^, and ifs PDF fakes fhe form 



fy(y) 




(y - h)" 

2a" 





FIGURE 2.4 

(a) PDF and (b) CDF of a Normal random variable with mean p and standard deviation a. 
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where exp() denotes the exponential function. Where a random variable, Y say, is 
assumed to follow a Normal disfribufion, if is convenfional fo wrife Y ~ Normal(p,a^) (or 
'Y follows a Normal disfribufion wifh mean p and variance a^'). If is useful fo remember 
fhaf approximafely 68% of fhe disfribufion lies wifhin one sfandard deviafion (a) of fhe 
mean, i.e. in fhe range from (p - a) fo (p + a). The infer-quarfile range (middle 50% of 
fhe disfribufion) is fherefore smaller fhan fwice fhe sfandard deviafion. In addifion, 
approximafely 95% of fhe disfribufion lies wifhin fwo sfandard deviafions of fhe mean, 
i.e. in fhe range from (p - 2a) fo (p + 2a), and almosf all of fhe disfribufion (more fhan 
99.7%) lies wifhin fhree sfandard deviafions of fhe mean. Figure 2.4 shows fhese prop- 
erfies in ferms of bofh fhe PDF (Figure 2.4a) and ifs infegral, fhe CDF (Figure 2.4b). So, 
for example, fhe 2.3% of fhe disfribufion lying above y = p + 2a corresponds fo a CDF 
value of 



Prob(Y < p + 2a) = 1 - 0.023 = 0.977 . 

The Normal disfribufion has fhe useful properly fhaf any linear function of a Normal 
random variable also has a Normal disfribufion. So if Y ~ Normal(p,a^), fhen for known 
consfanfs a and b, fhe random variable Z = aY +b has a Normal disfribufion wifh mean 
a[i + b and variance a^a^, i.e. Z ~ Normal(ap + b, a^a^). Z is convenfionally used fo represenf 
fhe standard Normal distribution with mean 0 and variance 1, obtained by setting a = 1/a 
and b = -p/a to centre and standardize any Normal distribution, i.e. 



Z = ~ Normal (0,1) . 



Quantiles of fhe sfandard Normal disfribufion are widely available in bofh books of sfafis- 
fical fables and sfafisfical packages. 

The sum of a sef of Normal random variables is also a Normal random variable. In par- 
ticular, for a sef of N independenf (uncorrelafed) Normal random variables ... Yj^, wifh 
common mean p and variance a^ fhen 

N 

^Y; ~ Normal (Np,Na") , 

i.e. fhe sum of fhe variables has a Normal disfribufion wifh mean Np and variance Na^. 
The mean of fhese variables (Y) is also Normally disfribufed wifh 

vS''' 

1=1 ^ ^ 



i.e. fhe mean of fhe variables has a Normal disfribufion wifh fhe same expecfed value, p, as 
fhe original variables, and wifh a smaller variance fhan fhe original variables, equal fo a^/N. 
The square roof of fhis laffer quanfify is known as fhe sfandard error of fhe mean, which 



26 



Statistical Methods in Biology 



is used often in statistical tests. As we might expect, as the number of random variables (or 
sample size), N, increases then the uncertainty associated with their mean, Y, decreases. 

In fact, the Central Limit Theorem (see Casella and Berger, 2002) states that the distribu- 
tion of the mean of any set of independent and identically distributed random variables 
will tend towards a Normal distribution, and this approximation becomes more accurate 
as the number of random variables contributing to the mean increases. This theorem holds 
even if the distributions of the individual random variables are not Normal. For example, 
suppose we have samples of 100 bean seeds, and we assess each seed for weevil infesta- 
tion. The mean rate of infestation in each sample may well have an approximate Normal 
distribution, although this would certainly not hold for the observations on the individual 
seeds, or for the means of small samples. This property means that in practice we fre- 
quently encounter observations with a distribution that is either Normal or approximately 
so, and hence we will usually make (and verify) this assumption. 

2.2.4 Distributions Derived from Functions of Normal Random Variables 

Once we have made the assumption of a Normal distribution for our random variables, 
then several other distributions, derived from functions of these variables, become useful. 
We merely state some results here, in their simplest form, in order to give context to their 
use later. Full details and derivations of the distributions introduced below can be found 
in standard statistical texts such as Hoel (1984) or Wackerly et al. (2007). 

The chi-squared distribution is associated with sums of squared Normal random vari- 
ables. For a set of N independent random variables Zj ... Zjy, with a standard Normal dis- 
tribution, the sum of the squares of these variables has a chi-squared distribution with N 
degrees of freedom, written as 



N 

~ xl . 

! = 1 

The symbol indicates a chi-squared distribution on k degrees of freedom (df) for A: > 0. 
The df determines the mean (equal to k) and variance (equal to 2k) of the distribution, 
which is defined for positive values only and is right-skewed (has a long tail on the right- 
hand side of the distribution). In this context, the df are related to the number of inde- 
pendent variables contributing to the sum. Now suppose we have a set of N independent 
Normal random variables Yj ... with common mean p and variance a^. The unbiased 
sample variance of this set, denoted as random variable S^, has a scaled chi-squared distri- 
bution with N - 1 df, i.e. 



S" 



(N-1) 



I(X--Y)^ 



(N-1) 



X 



2 

N-1 ■ 



The variables in this sum are no longer independent, due to centering by the sample mean, 
and so the df of the distribution are reduced by one. We can rescale by factor (N - l)/a^ 
to obtain a variable (Q^) with an unsealed chi-squared distribution, as 






(N-1) .2 




i=l 
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Student's t-distribution is associated with test statistics calculated as the ratio of an esti- 
mate of locafion fo ifs sfandard error. In an absfracf confexf, if a random variable Z has a 
sfandard Normal disfribufion, i.e. Z ~ Normal(0,l), and V is an independenf random vari- 
able wifh a chi-squared disfribufion on v df, i.e. V ~ then 



T = 



Z 



where denotes a Student's t-distribution on v df. The f-disfribufion is a bell-shaped 
symmefric disfribufion, wifh mean zero, buf wifh faffer fails and a flaffer peak fhan fhe 
Normal disfribufion (Figure 2.5). As fhe number of df becomes large, fhe f-disfribufion 
converges towards fhe sfandard Normal disfribufion. 

We usually meef fhis disfribufion in fhe confexf of a sef of independenf Normal random 
variables ... Y^, wifh common mean p and variance a^. Consider fhe following sfafisfic: 

(y-b) ^ Vn(y - b) ^ a/N(Y-p) ,, 1 

^/s^ Va^QV(N - 1) VqV(N - 1) ■ 



In fhe firsf sfep above, we have rewriffen fhe divisor of fhe denominafor (N) as a mulfiplier 
of fhe numerator, and rewriffen in terms of Q^. In fhe second sfep, we associated fhe 
square roof of wifh fhe numerator. From Equafion 2.7 and resulfs given above, we can 
deduce fhaf ^/N (Y - p)/a has a sfandard Normal disfribufion, and we know ~ 
so, as fhese quanfifies are independenf, if follows thaf 



(y-b) , 

I , w-i 

VsVn 



( 2 . 8 ) 



Here, fhe df are associafed wifh fhe denominafor, i.e. fhe esfimafed sfandard error of fhe 
mean, V(S^/N). We use fhis resulf frequently, for example Secfion 2.4.1. For calculafion of 




FIGURE 2.5 

PDF for standard Normal distribution (|X = 0, a = 1, black line) and t-distribution with 2 df (grey line). 
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power (Chapter 10), we also use the result that for any constant c, (Z + c)/yJV /v ~ t^(c), 
where t„(c) denotes a non-central t-distribution with non-centrality parameter c. 

The F-distribution is associated with test statistics calculated as the ratio of fwo sums of 
squares, in absfracf, if iTj and Lfj are independenf chi-squared random variables wifh 
and ^2 df, respecfively, fhen 



U^/d^ 
U2/d2 ~ 



where denofes an F-distribution with d^ and dj df. This disfribufion is righf-skewed 
and is defined only for posifive values. 



2.3 From Sample Data to Conclusions about the Population 

An important role of statistics is to provide information on a population based on data 
obtained from a sample. This is what is commonly known as statistical inference. Two 
types of inference are described below: point and interval estimation and hypothesis 
testing. 

2.3.1 Estimating Population Parameters Using Summary Statistics 

We have already seen that we can interpret the sample mean and unbiased sample vari- 
ance as estimates of the mean and variance of the underlying distribution of the popula- 
tion. Point estimation corresponds to this process of calculating (or estimating) a single 
summary value (or statistic) from the sample data that constitutes our 'best guess' of a 
population parameter. By convention, Greek letters are used to denote population param- 
eters and sample statistics are denoted with 'equivalent' lower case Roman letters. The 
parameters for the population mean and variance, and their unbiased sample estimates 
(defined in Section 2.1), are usually labelled as shown in Table 2.2. 

There is always uncertainty in the parameter estimates because of variability in the 
sampling process, and frequently because measurements themselves are imprecise. This 
uncertainty should also be estimated and reported by a standard error (SE). For example, 
we estimate the population mean using the sample mean, and we write this as |I = y. 
Equation 2.7 states that the distribution of the mean of a sample from a Normal popula- 
tion is itself Normal, with mean equal to the population mean and variance equal to c^/N. 
This distribution can be interpreted as the set of outcomes that can be achieved by taking 
independent samples from the underlying population. The standard error of the estimate 



TABLE 2.2 

Notation for Population Parameters and Their Sample Statistics 



Sample Statistics 


Population 


Sample 


Mean 


IT 


y 


Variance 






Standard deviation 


G 


S 
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is then the standard deviation of this distribution, written as SE(il) = a/ -Jn. When a is 
unknown, as is usually the case, we substitute its estimate, s, to get an estimated SE which 
we write as 



SE(|i) = s/Vn . 



The hat over SE emphasizes the fact that the SE is itself esfimafed. This quanfify is known 
as fhe estimated standard error of the mean, which we denote SEM. 

Whereas point estimation provides a single estimate of a populafion paramefer, interval 
estimation provides a range of esfimafes wifhin which fhe paramefer is likely fo occur. 
Confidence intervals (CIs) are the most common example of inferval esfimafes. Eor exam- 
ple, consider a 95% Cl for fhe mean of a Normal random variable Y, wifh known variance 
a^. Using a sample of size N, j/j ... wifh mean y, a 95% Cl fakes fhe form 



y- \ 1.960 X 



Vn 



-t 1.960 X 



^/N 



(2.9) 



where 1.960 corresponds fo fhe 97.5fh percenfile of fhe Normal disfribufion. The leff-hand 
value is fhe lower limif and fhe righf-hand value is fhe upper limif of fhe CL This inferval 
is derived from fhe properfy of fhe random variable fhaf 



Prob 



-1.960 < 



y-p 



< 1.960 



0.95 , 



where Y is fhe mean of a hypofhefical sample from Y of size N. Nofe fhaf we obfain fhe 95% 
coverage properfy by excluding 2.5% of fhe disfribufion in each fail. Unforfunafely, since 
an acfual sample mean is a fixed quanfify, we cannof make fhis same probabilify sfafemenf 
abouf fhe Cl in Equafion 2.9 - fhe calculafed Cl eifher does or does nof confain fhe popula- 
fion mean. The probabilisfic inferprefafion of fhis Cl is fhaf, if we repeaf fhe sfudy many 
fimes, fhen 95% of our calculafed CIs will confain fhe populafion mean. More informafion 
abouf fhe derivafion of CIs can be found in sfandard sfafisfical fexfs, for example Wackerly 
ef al. (2007). 



2.3.2 Asking Questions about the Data: Hypothesis Testing 

Hypothesis testing is a form of inference where pairs of hypotheses for a populafion 
paramefer are compared using fhe informafion from a random sample of observafions. 
The fwo hypofheses are defined as fhe null (denofed Hq) and fhe alternative (denoted Hj 
or, sometimes, H^). The null hypothesis Hq represents the status quo, and we accept (cannot 
reject) this unless we obtain strong evidence from our sample fhaf if is false. The hypofh- 
esis Hj represenfs an alfernafive sfafe fhaf confradicfs fhe null hypofhesis and may be 
one-sided or fwo-sided. Wifh a two-sided Hj we do not specify a direcfion for fhe alferna- 
five hypofhesis - we are jusf inferesfed in defecfing whefher fhe status quo is implausible. 
Wifh a one-sided Hj we are inferesfed in defecfing deviafions in a parficular direcfion. Eor 
example, consider fhe sifuafion where a scienfisf has obfained samples of aphids from fwo 
adjacenf fields, one sprayed and one unsprayed, and wanfs fo know if fhere are differences 
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in resistance to the applied pesticide. The null hypothesis assumes no difference in resis- 
fance befween fhe fwo fields, and fhe alfernafive hypofhesis mighf be fhaf fhe populafion 
from fhe sprayed field shows greafer resisfance fhan fhe ofher (one-sided) or simply fhaf 
resisfances in fhe fwo fields are nof fhe same (fwo-sided). 

To fesf fhe null hypofhesis againsf fhe alfernafive, we firsf need fo idenfify an appropriafe 
test statistic which has a known statistical distribution when the null hypothesis is true. The 
actual statistic used depends on the characteristics of fhe problem, buf you may already be 
familiar wifh some common sfafisfical fesfs, such as fhe f-fesf and fhe F-fesf, and fhe fesf sfa- 
fisfics associafed wifh fhem. Having consfrucfed a fesf sfafisfic, fhe final sfep in fhe hypofh- 
esis fesfing process involves assessing fhe consisfency of fhe observed fesf sfafisfic wifh fhe 
null hypofhesis. Under fhe null hypofhesis (i.e. on fhe assumpfion fhaf fhe null hypofhesis 
is frue), if fhe probabilify (P) of obfaining a fesf sfafisfic as exfreme as fhe observed value is 
small fhen we have sfafisfical evidence againsf fhe null hypofhesis. The fesf sfafisfic fhus 
assesses how well fhe dafa supporf fhe null hypofhesis. The sfrengfh of fhis evidence is 
quanfified by fhe observed significance level of fhe fesf, denofed above as P and somefimes 
called fhe P-value. If is good pracfice fo predefermine fhe level of significance required fo 
rejecf fhe null hypofhesis (denofed aj; fhis is offen chosen as 5% (a^ = 0.05). This is known 
as fhe Type I error, and represenfs fhe probabilify of rejecfing fhe null hypofhesis when in 
facf if is frue. More defails of hypofhesis fesfing, including fhe concepfs of Type II error and 
sfafisfical power, are discussed in Chapfer 10. The calculafions associafed wifh fhe fesf sfa- 
fisfics used in hypofhesis fesfs can also be used fo derive CIs (Secfion 2.3.1) for populafion 
paramefers, i.e. an indicafion of fhe likely range of values for fhe quanfify of inferesf, faking 
accounf of fhe uncerfainfy associafed wifh fhe esfimafe of fhe populafion paramefer. These 
concepfs are illusfrafed in more defail in fhe examples below. 



2.4 Simple Tests for Population Means 

In the following sections we describe the one- and two-sample t-tests, which are used 
extensively later in the contexts of regression modelling and treatment comparisons. 

2.4.1 Assessing the Mean Response: The One-Sample t-Test 

When we collect a single sample of observations we often want to make inferences about 
the value of the unknown mean of the population from which we have drawn the sample. 
We assume that we have a sample of N independent observations from a single popula- 
tion, y^ ... j/n- Suppose we wish to test the null h 5 q?othesis that the population mean is 
equal to a given value, i.e. Hgi p = c, against a general two-sided alternative hypothesis, i.e. 
Hp \i^c, where c is some pre-determined constant value (referred to as a two-sided test, 
see Section 2.3.2). 

We can estimate the population mean, p, by the sample mean, y (Equation 2.1). Usually, 
the population variance is also unknown and is estimated by the unbiased sample vari- 
ance, (Equation 2.3). The null hypothesis is then evaluated using a one-sample t-test, 
with test statistic, t, computed as 

t = 

SE(p-c) s/Vn ' 
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i.e. as the ratio of the difference befween fhe esfimafed mean and fhe consfanf c and fhe 
esfimafed sfandard error of fhaf difference. As c is fixed and known, if has no uncerfainfy 
and so, in fhis case, fhe sfandard error of fhe difference befween y and c is simply equal 
fo fhe sfandard error of y and esfimafed by SEM = s/a/N, as in Secfion 2.3.1. If fhe null 
hypofhesis is frue fhen fhe populafion mean is equal fo c and fhe f-sfafisfic should be close 
fo zero. If if is nof fhen fhis gives evidence againsf fhe null hypofhesis. For a given signifi- 
cance level, ttg, we can defermine whefher fhe dafa provide sufficienf evidence againsf fhe 
null hypofhesis by comparing our fesf sfafisfic wifh a crifical value from fhe appropriafe 
disfribufion. 

If Ho is frue and fhe observafions are Normally disfribufed (or af leasf approximafely 
so) fhen fhe observed fesf sfafisfic f follows a Sfudenf's f-disfribufion wifh N - 1 df, as in 
Equafion 2.8. The critical value defines a fhreshold such fhaf fesf sfafisfics more exfreme 
fhan fhis value occur wifh probabilify when fhe null hypofhesis is frue. Because we are 
considering a fwo-sided fesf, bofh large posifive and large negafive values of fhe fesf sfafis- 
fic are unlikely under Hp, so we consider fhe probabilifies associafed wifh exfreme values 
in bofh fails of fhe f-disfribufion. We denofe to be the 100(1 - as/2)th percentile of 
fhe f-disfribufion wifh N - 1 df, and use fhis as our crifical value. For example, if we choose 
ttj = 0.05, fhen fK°P', fhe 97.5fh percenfile of fhis disfribufion, is fhe crifical value. From fhe 
definifion of percenfiles (Secfion 2.2.1) and fhe symmefry of fhe f-disfribufion abouf zero if 
follows fhaf, for a random variable N_i wifh a f-disfribufion on N - 1 df, 

Prob(fM-i < -tfc(^') = Prob(N-i > tt?l(^') = as/2 . 

So Prob(|fN_i| > ) = tts, i.e. fhe probabilify of equalling or exceeding fhe crifical value 

is ttg, as required. We rejecf fhe null hypofhesis if fhe absolufe value of our fesf sfafisfic, 
denofed |f |, meefs or exceeds fhis value. This sifuafion is illusfrafed in Figure 2.6. 

The observed significance level of fhis fesf is calculafed as P = Prob( | N_i | ^ t), i.e. fhe 
probabilify under Hq of obfaining a resulf more exfreme (larger posifive or negafive) fhan 
fhaf observed. If fhe observed significance level is less fhan fhe pre-defermined signifi- 
cance level ttg fhen fhe null hypofhesis is rejecfed. 




FIGURE 2.6 

Critical regions for a two-sided one-sample t-test with probability level a^. Shaded area covers lOOtt^yo of distri- 
bution containing the most extreme values. 1 1 1 is the absolute value of an observed t-statistic greater than the 
critical value at significance level a^. 
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The test can also be used to calculate a 100(1 - aJTo Cl (Section 2.3.1) for the population 
mean p as the range 



Note that if fhe f-fesf is nof significanf af level a^, i.e. if fhe dafa are consisfenf wifh fhe null 
hypofhesis, fhen this 100(1 - aj% Cl will contain the hypothesized value, c, as a plausible 
value for the population mean. Conversely, if fhe null hypofhesis is rejecfed, fhen this Cl 
will not contain c, as it is then an unlikely value for fhe populafion mean. 

EXAMPLE 2.1B: WHEAT YIELDS* 

Consider the set of seven yield measurements presented in Example 2.1A. Historical 
records indicate that for these field plots, the expected yield should be close to 9 tonnes 
per hectare (t/ha). Because the mean yield from this harvest was 7.6 there is concern that 
this year is atypical. To evaluate this, we perform a one-sample t-test with null hypoth- 
esis H„: p = 9, and alternative hypothesis Hy p 9. Recall that the unbiased sample 
variance was 6.95. The t-statistic is calculated by substituting the (unrounded) sample 
statistics into the t-test formula as 



This statistic has N - 1 = 6 df. The critical value is the 97.5th percentile value of the 
t-distribution with 6 df, equal to 2.447. Because the absolute value of the test statistic 
(|t| = 1.433) is smaller than the critical value, we fail to reject the null hypothesis at the 
5% significance level and conclude that there is not enough evidence to indicate that this 
year is atypical. 

A 95% Cl for the population mean can be calculated as 

[ 7.57 - (2.447 x ^6.95/7) , 7.57 + (2.447 x ^6.95/7) ] = (5.1, 10.0) . 

As expected from the results of the significance test, this Cl contains the expected yield 
of 9 t/ha. 

2.4.2 Comparing Mean Responses: The Two-Sample t-Test 

We now consider the case in which we have two treatment groups and wish to test for 
differences befween fhe populafion means for fhe fwo freafmenfs. In fhis case fhe null 
hypofhesis (Hq) is fhaf fhe populafion means for fhe fwo freafmenf groups are equal, and 
fhe fwo-sided alfernafive hypofhesis (Hj) is fhaf fhe fwo populafion means are differenf. 
If we label fhe populafion means for fhe fwo freafmenfs as pj and P 2 , respecfively, fhese 
hypofheses can be wriffen as 




7.57-9 -1.43 



= -1.433 . 



V6.95/7 0.997 



H(,: Pi = P 2 and Hp pj P 2 . 

We have again specified a fwo-sided fesf, buf we could specify a one-sided alfernafive 
hypofhesis if fhaf were appropriafe. The populafion means for fhe fwo freafmenf groups 
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are estimated by their sample means. The test compares the difference befween fhese fwo 
sample means wifh an esfimafe of fhe uncerfainfy in fhis difference due fo background 
variafion. If fhe difference in sample means is large compared fo fhe background variafion 
fhen fhere is evidence for a difference befween fhe fwo populafion means. 

Suppose we have a sample of observafions for freafmenf group 1 (denofed 
k=l ... Wj), and a sample of observafions for group 2 (denofed y 2 t, k=l ... nf). The esfi- 
mafes for fhe group means are calculafed as 



1 

Pi = 1/1. = — 
Ml 




and 



1 

P2 - 1/2. - 

1l2 




wifh fhe dof nofafion as defined in Secfion 2.1. If we can reasonably assume fhaf fhe back- 
ground variafion is fhe same for fhe fwo groups, fhen fhe background variabilify is esfi- 
mafed by a weighfed sum of fhe unbiased wifhin-group sample variances, also known as 

a pooled estimate of variance 



l^pooled 



(Wl - 1) X S? -h (W2 - 1) X sf 

Ml + 1l2 - 2 



where fhe unbiased wifhin-group sample variances are calculafed as usual (see Equafion 
2.3) as 



sf = 



(ni - 1) 



ril 

^(yu - yi.)^ and 



si = 



(«2 - 1 ) 



ti2 

^(y2r - y2.f 

k=\ 



If fhe freafmenf variances cannof be assumed equal, fhen a modified sfafisfic musf be used 
(see for example Wackerly ef al., 2007). The df associafed wifh fhe pooled esfimafe of vari- 
ance are equal fo fhe fofal number of elemenfs in fhe fwo summafions, N = n^ + H 2 , minus 
fhe number of freafmenf means esfimafed, here fwo; hence, fhe df are iij -i- - 2 = N - 2. 

The esfimafed standard error of the difference befween fhe fwo sample means, denofed 
SED, can fhen be calculafed as 



SED = SE(Pi- P 2 ) = ^Spooled ^ • 

The null hypofhesis is fhen evaluafed using a two-sample t-test with the test statistic, t, 
computed as 



~ P2 _ yi. ~ y2. 

SE(|li - jiz) SED 



( 2 . 10 ) 



i.e. as the difference befween fhe esfimafed populafion means, divided by fhe esfimafed 
uncerfainfy in fhaf difference due fo fhe background variabilify. If fhe dafa follow a 
Normal disfribufion and fhe observafions are independenf fhen under fhe null hypofhesis 
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this test statistic has a t-distribution with N - 2 df. This distribution can be used to obtain 
critical values, as for the one-sample t-test, and to obtain the observed significance level. 
Alfernafively (and equivalenfly), fhe squared value of fhe fesf sfafisfic has an F-disfribufion 
on 1 and N - 2 df. This disfribufion is useful lafer when we have more fhan fwo groups fo 
compare (Secfion 4.3). 

As for fhe one-sample case, fhe f-disfribufion can be used fo consfrucf a Cl, buf now for 
fhe difference in populafion means, Pj - P 2 - A. 100(1 - aj% Cl for fhis difference can be 
calculafed as 



[(yi. - y 2 .) - (f',“lf X SED) , (y^. - y,.) + x SED)] . 



If fhe f-fesf is nof significanf af level a^, i.e. if fhe dafa are consisfenf wifh fhe null hypofhe- 
sis, fhen this 100(1 - aj% Cl will contain zero as a plausible value for fhe difference in pop- 
ulafion means. Conversely, if fhe null hypofhesis is rejecfed, fhen fhis Cl will nof confain 
zero, as if is fhen an unlikely value for fhe difference befween fhe fwo populafion means. 



EXAMPLE 2.1C: WHEAT YIELDS* 

A standard commercial and a new 'improved' wheat variety are to be compared using 
yield measurements obtained from 14 plots in a field trial. The objective of the study 
is to determine whether the varieties produce different average yields. For each vari- 
ety, yields were obtained from n = 7 small plots, and converted into tonnes per hectare 
(t/ha). The data for the commercial variety were analysed in Examples 2.1A and B, and 
the complete data set can be found in Table 2.3 and in file wheat.dat. 

The hypotheses can be stated as 

both varieties give the same mean yield 
Hy the varieties give different mean yields. 

In mathematical terms this can be written as 

Hq: Pi = Pj (the population means are equal) 

Hy Pi ^ Pj (the population means are not equal) 



TABLE 2.3 



Yield Measurements (in t/ha) from a Standard Commercial and an 
Improved Wheat Variety Obtained from Plots in a Field Trial 
(Example 2.1C and file wheat.dat) 



Plot 


Variety 


Yield 


Plot 


Variety 


Yield 


1 


Commercial 


7 


8 


Improved 


12 


2 


Commercial 


9 


9 


Improved 


8 


3 


Commercial 


6 


10 


Improved 


12 


4 


Commercial 


12 


11 


Improved 


9 


5 


Commercial 


4 


12 


Improved 


8 


6 


Commercial 


6 


13 


Improved 


16 


7 


Commercial 


9 


14 


Improved 


7 
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Here, |^i corresponds to the population mean of the commercial variety, and Pj to that 
of the improved variety. To test the null hypothesis we substitute the summary statistics 
for these data into the t-test formula (Equation 2.10). The sample means for each variety 
are i/i. = 7.57 (from Example 2.1A) and 

12 + 8 + 12 + 9 + 8 + 16 + 7 
V2. - = 10.29 . 



The unbiased sample variances are si = 6.952 and sf = 10.238, with pooled variance 



Spooled 



(7 - 1) X 6.952 + (7 - 1) X 10.238 
(7 + 7-2) 



8.595 , 



giving 



SED 




8.595 X 



2 

7 



1.567 . 



Einally, the observed test statistic is 



j/i. - y2. ^ 7.57 - 10.29 
SED “ 1.567 



-2.71 

1.567 



-1.732 . 



The absolute value of fhe test statistic (|t| = 1.732) is compared with a critical value 
of the t-distribution with 7 + 7 - 2 = 12 df. If we set = 0.05 (i.e. use a 5% significance 
level) then the critical value for this two-sided test is = 2.179 (the 97.5th percen- 
tile of fhe t-distribution with 12 df). As 1.732 is less than 2.179, we cannot reject Hq. We 
might report that there is insufficient statistical evidence to conclude that the mean 
yields of the commercial and improved varieties are different. Alternatively, we might 
report the observed significance level for this test statistic as P = 0.109, obtained from 
(.[0.0545] _ 732 Again, because this value is larger than our chosen significance level of 

Os = 0.05, we cannot reject Hg. A 95% Cl for the difference in population means (pj - Pj) is 

(-2.714 - (2.179 X 1.567), -2.714 + (2.179 x 1.567)) = (-6.1, 0.7) . 

It follows fhat zero is a plausible value for the difference in population means as it is 
contained in the Cl, agreeing with the result of fhe hypothesis test. 

Note that to have achieved significance af the 5% level the test statistic, t, would have 
needed to satisfy |t| > 2.179. If the unbiased sample variance is assumed equal to the 
pooled estimate. Spooled, then this requires 



|yi- - yi. 

1.567 



> 2.179 , 



so that the difference between the two variety means would have needed to be at least 
as large as 3.42 t/ha (3.42 = 1.567 x 2.179). This large difference is required because the 
pooled variance is relatively large, and the total number of observations is small, and 
hence there is considerable uncertainty in the estimates of the population means for the 
two varieties. 
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2.5 Assessing the Association between Variables 

When two variables have been measured on the same material, it is often of inferesf fo 
know whefher fhe variables are independenf or whefher fhey show some associafion. For 
example, fhe diamefer and weighf of seeds would be expecfed fo show a sfrong posifive 
associafion such fhaf seeds wifh larger diamefers also have larger weighfs. Covariance and 
correlafion are bofh measures of fhe sfrengfh and direcfion of a relafionship befween fwo 
variables. Correlafion is a more useful measure, because if uses a sfandardized scale and 
is independenf of fhe scale of measuremenf. However, correlafion is offen defined in ferms 
of covariance, so we discuss fhe laffer firsf. 

Suppose we have observafions of fwo variables, here labelled x and y, measured on fhe 
same unifs so fhe dafa consisf of N pairs (x„ y,). The unbiased sample covariance befween 
fhe fwo variables, is defined as 



Sxy - 



1 

(N-1) 



JV 

^(x, -x)(y,--y) , 



( 2 . 11 ) 



i.e. we cenfre bofh variables, calculafe fhe producf of fhe cenfered variables for each unif, 
and fhen sum fhe resulfing values over all unifs and divide by N - 1. This is an unbiased 
esfimafe of fhe populafion covariance befween fhese fwo variables, denofed If bofh 
variables fend fo be large on fhe same unifs, fhen fhe covariance will be large and posifive. 
If one variable fends fo be large when fhe ofher is small fhen fhe covariance will be large 
and negafive. If fhe variables are complefely unrelafed fhen fhe covariance will be close fo 
zero. The formula for fhe sample covariance in Equafion 2.11 has a similar form fo fhaf for 
fhe sample variance in Equafion 2.3, and if is easy fo verify fhaf fhe covariance of a variable 
wifh ifself is simply ifs unbiased sample variance. Analogous fo Equafion 2.2, fhe sample 
covariance would use fhe divisor N in Equafion 2.11 in place of N - 1, buf gives a biased 
esfimafe of fhe populafion covariance. 

Where we have several random variables measured on fhe same unifs, we can assem- 
ble fheir variances and covariances info a single sfrucfure, called fhe variance-covariance 
matrix. This is a S5nnmetric matrix with rows and columns indexed by the variables. The 
variances are held on the diagonal, and the covariances are held in the off-diagonal positions. 
Eor example, for fhree variables, x, y and z, fhe variance-covariance mafrix fakes fhe form 





^xy 


\ 

^XZ 






<^yz 


<^zx 




^ J 



( 2 . 12 ) 



The mafrix is symmefric because covariances are invarianf fo fhe order in which fhe vari- 
ables are specified, so fhaf and so on. Eor fhis reason, if is sufficienf fo presenf only 

fhe values on or below fhe diagonal (fhe lower friangle). The unbiased sample variance- 
covariance mafrix replaces the population values by the unbiased sample variances and 
covariances. 

Unfortunately, covariance is strongly dependent on scale - if you converf a sef of mea- 
suremenfs from cenfimefres fo inches fhen fhe covariance will also change, alfhough 
relationships befween fhe variables clearly do nof change. Correlation is a standardized 
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measure of the strength and direction of a relationship between two variables that is inde- 
pendent of the scale of measurement. In general statistical usage, correlation quantifies 
the departure of two variables from independence, and many correlation coefficients 
have been defined. 

We use Pearson's product-moment correlation coefficient, which is derived directly 
from the covariance, and is appropriate for estimating correlation between two variables 
with an underlying bivariate Normal distribution defined by the population means and 
standard deviations of each variable and the population correlation coefficient, p. This 
coefficient measures the strength of a linear relationship between the variables and can 
be estimated by the sample correlation coefficient, r, calculated as the unbiased sample 
covariance divided by the product of the unbiased sample standard deviations. If we write 
Sy and Sj. to be the unbiased sample standard deviations of variables y and x, respectively, 
the sample correlation coefficient is defined as 



Sx X Sy 

The value of r has no units and lies between -1 and -i-l. A few examples are illustrated in 
Figure 2.7. In Figure 2.7a, a strong positive correlation is shown (r = 0.96). If the observa- 
tions lie exactly on a straight line with a positive slope, then x and y are perfectly correlated 
(r = 1), a concept that is known as collinearity where one variable can be completely deter- 
mined from the other. In Figure 2.7b, a weaker negative correlation is pictured (r = -0.57). 
When r = -1 the paired observations also lie exactly on a straight line but with a negative 
slope. Both Figures 2.7c and d show a weak sample correlation between the variables (r 
close to 0). For Figure 2.7c the points show a random scatter, whilst for Figure 2.7d there 
is a clear relationship between x and y, but one that is not linear. When two variables are 
independent, so that a value of y does not depend in any way on the value of x, then their 
sample correlation coefficient will be close to zero; however, the converse is not necessar- 
ily true (as shown in Figure 2.7d) because the correlation coefficient detects only linear 
dependencies between two variables. Flence, a scatter plot of the variables should always 
be considered alongside the summary value of r. 

For more than two variables, it is conventional to present the set of pairwise correlations in 
matrix form, similar to the variance-covariance matrix in Equation 2.12 but with value 1 on 
the diagonal and correlation coefficients for pairs of variables in the off-diagonal positions. 

It is important to understand that strong correlation does not necessarily imply causa- 
tion. If X and y are correlated, then there are four possibilities to consider: (1) x causes y, (2) 
y causes x; (3) a third variable, z, influences both x and y; or (4) there is no relationship but 
by chance an atypical joint sample has been produced. For this reason, causal conclusions 
should not be drawn from correlations without further information or experimentation, 
or both. 

It is sometimes of interest to investigate whether there is any statistical evidence of cor- 
relation between two variables. Flere, the null hypothesis states that the population corre- 
lation coefficient is equal to zero, i.e. Fig: p = 0, and is tested against a two-sided alternative 
hypothesis that the correlation is non-zero, i.e. FIp p 0. The null hypothesis is evaluated 
using a t-statistic based on the sample correlation, calculated as 



N-2 
1-r^ ■ 



t = r 
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FIGURE 2.7 

Scatter plots illustrating correlation patterns between two variables: (a) strong positive correlation; (b) moderate 
negative correlation; (c) uncorrelated and unrelated variables; (d) uncorrelated but related variables. 



If the two variables have a Normal distribution, then under the null hypothesis this sta- 
tistic has a t-distribution with N - 2 df. The significance level of the test depends on both 
the sample correlation and the sample size: a given value of r becomes more significant as 
N increases. This test can also be used for variables with a non-Normal distribution, but 
the distribution of the test statistic is then approximate and may be inaccurate unless the 
sample size is large. 

EXAMPLE 2.3B: WILLOW BEETLE MEASUREMENTS 

The lengths and widths (mm) of the sample of willow beetles described in 
Example 2.3A are plotted in Figure 2.8. Some points represent multiple observa- 
tions, with the area of each plotted symbol proportional to the number of observa- 
tions represented. It seems clear that there is a positive relationship between the two 
variables, although there is one beetle much wider than would be expected from 
its length (point in bottom right of plot). The unbiased sample variance-covariance 
matrix takes the form 
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FIGURE 2.8 

Length (mm) plotted against width (mm) for 50 willow beetles (Example 2.3B). Area of points is proportional to 
the number of observations at that position. 



where we use the subscript L to indicate length and W to indicate width measurements. 
The sample correlation between length and width is therefore calculated as 



r 



Slw 

Sl X Sw 



0.00586 

x/0.02595 X VO.00434 



0.5528 . 



Hence, we observe a positive correlation between these traits of 0.55. The distribution 
of these variables is consistent with a Normal distribution (e.g. see Figure 2.3), and so 
we formally test whether this result is consistent with an uncorrelated population. We 
calculate the t-statistic as 



t = 



N-2 

1-r^ 



0.5528 X 



48 

0.6944 



4.596 . 



The value 4.596 is then evaluated against a t-distribution with 48 df, giving observed 
significance level P < 0.001. Hence, we have strong evidence that the population correla- 
tion coefficient between length and width is not zero. 



2.6 Presenting Numerical Results 

The presentation of numerical results is fraught with difficulties because various, some- 
what arbitrary, choices have to be made on rounding and the number of significant figures 
shown. In this section we describe the conventions that we try to follow in this book. This 
should make our written calculations easier to follow, as well as suggesting general guide- 
lines for use in other contexts, such as scientific publications. 

Where we show calculations in the text we have necessarily rounded the numbers pre- 
sented, and this includes any intermediate results. However, to get answers that match 
statistical software, we have not actually implemented this rounding in our calculations; 
we have always retained full accuracy. This means that there will be small differences 
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(usually in the last decimal place) between the answers we present and those achieved by 
the calculations shown. Similarly, we round quantities presented in ANOVA tables (see 
Chapter 4) but do all of our calculations based on unrounded quantities. 

In presenting the results of hypofhesis fesfs, we show fesf sfafisfics, crifical values and 
observed significance levels fo fhree decimal places. This gives sufficienf accuracy for mosf 
sifuafions. 

The definifion of a suifable scale for dafa-dependenf quanfifies is more difficulf. For any 
variafe, we firsf idenfify fhe scale af which we see variafion wifhin if, and we call fhis 
ifs nafural granularify. This may be relafed fo fhe size of fhe numbers, buf fhis is nof 
always fhe case, and so we consider several examples, sfarfing wifh fhe yields recorded 
in Example 2.1 (Table 2.3). These are all infeger values in fhe range 4-16. All of fhe infor- 
mafion is capfured by fhese numbers wifh zero decimal places and so we say fhaf fhis is 
fhe nafural granularify for fhese observafions. Exacfly fhe same argumenf holds for fhe 
infeger counf dafa of Example 2.2. However, fhe sifuafion is slighfly differenf for Example 
2.3 (Table 2.1). Here bofh fhe beefle lengfhs (4.10-4.95 mm) and widfhs (1.40-1.725 mm) are 
measured fo fhe nearesf 0.025 mm. Accurafe represenfafion of fhe measuremenfs requires 
fhree decimal places, buf fhis implies much more accuracy fhan is acfually presenf, and so 
we denofe fhe nafural granularify as fwo decimal places (0.01) even fhough fhis sfill gives 
slighfly greafer accuracy fhan is presenf in fhe measuremenfs. 

A differenf sifuafion holds for many machine-calculafed measuremenfs, for example 
where small-plof yields have been converfed info acre or hecfare yields, or observafions 
have been fransformed fo or from fhe logarifhmic scale. In fhese cases, fhe conversion 
befween scales can infroduce many superfluous decimal places. All of fhe decimal places 
(up fo rounding error) should be refained prior fo analysis, so fhaf no accuracy is losf if any 
furfher fransformafion is required (e.g. as described in Chapfer 6). Af fhe poinf of analysis, 
we suggesf fhaf fhe nafural granularify is decided from fhe range of fhe dafa, calculafed 
as fhe minimum value subfracfed from fhe maximum value, and by use of fhe firsf fhree 
significanf figures of fhis range fo define fhe nafural granularify for fhe presenfafion of 
resulfs. Eor example, dafa on 0-100 (range 100) has nafural granularify of 0 (zero decimal 
places); dafa on 1000-1010 (range 10) has ifs nafural granularify defined as 0.1 (one decimal 
place). If fhe nafural granularify is greafer fhan 1, we do nof generally advocafe round- 
ing values, buf if may be worfh rescaling fhe dafa (dividing by fhe nafural granularify) fo 
avoid fhe appearance of spurious accuracy. Similarly, if fhere is no variafion in fhe leading 
significanf figures, if may make sense fo remove fhose digifs. Occasionally fhese guide- 
lines may fail and, in fhaf case, common sense should be applied. 

Once fhe nafural granularify of a dafa sef has been defined, if can be used fo define a sen- 
sible scale for reporfing sfafisfics. We should reporf sample means and esfimafes of ofher 
locafion paramefers fo one decimal place more fhan fhe nafural granularify. We should 
reporf sums of squares, mean squares, and esfimafes of variances and sfandard deviafions 
or errors fo fwo decimal places more fhan fhe nafural granularify. We use greafer accuracy 
for sfandard deviafions and errors because mulfiples of fhese quanfifies are offen used fo 
compare differences befween locafion esfimafes (e.g. group means). A suifable scale for a 
regression coefficienf is harder fo define as if depends on fhe scale of ifs explanatory vari- 
afe. Here, we suggesf using sufficienf decimal places for such coefficienfs fo ensure fhaf we 
can reporf fhe fiffed values fo one decimal place more fhan fhe nafural granularify, wifh 
addifional precision on coefficienf sfandard errors. We have offen found ourselves breaking 
fhese rules for convenience of presenfafion in fhis book, buf neverfheless recommend fhaf 
you fhink carefully abouf fhe appropriafe level of precision for your own circumsfances. 
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EXERCISES 

2.1* A plant ecologist is interested in the distribution of one species of grass 
within a field. She invesfigafes fhis by throwing a 0.1 quadrat to 20 ran- 
dom positions in the field and counfing fhe number of plants of fhe species in 
fhe quadraf af each posifion. The counfs for the 20 quadrats were: 15, 12, 6, 7, 
4, 2, 10, 14, 3, 6, 9, 9, 2, 11, 10, 3, 2, 11, 9 and 10. File grass.dat contains the unit 
number (variate Quadrat) and plant count (variate Count) for each quadraf. 
Consider whether these data should be considered as continuous or discrete, 
and draw a bar chart or histogram (as appropriate). Obtain the sample mean, 
median and inter-quartile range. What can you say about the distribution of 
fhese dafa? 

2.2 Obfain a hisfogram of fhe beefle widfhs (mm) given in Table 2.1 (and variafe 
Width in file willow.dat). Do fhese dafa seem consisfenf wifh a Normal disfri- 
bution, as asserfed in Example 2.3B? 

2.3* The one-sample f-fesf is rarely used in analysis of experimenfal dafa, excepf 
in fhe confexf of regression, buf if can be useful for analysis of paired sam- 
ples from a sef of subjecfs. In fhis scenario, fhe fwo sample f-fesf is nof valid 
because fwo samples from a single subjecf are nof independenf. However, if 
we analyse fhe differences befween fhe samples from each subjecf, we can 
use a one-sample f-fesf fo fest fhe null hypofhesis of no difference befween 
samples. 

An experimenf made measurements of Rubisco profein (on a relafive scale) 
in 12 grass planfs before and affer a droughf sfress period of five days. File pro- 
TEiN.DAT confains the unit number (DPIant) and Rubisco measurements (vari- 
ates Before and After) for each planf . Calculafe fhe change in amounf of Rubisco 
protein in each plant and analyse this change using a two-sided one-sample 
t-test. Write down the null and alternative hypotheses for this test and interpret 
them in the context of this experiment. Is there any evidence that the amount of 
Rubisco has changed after five days of droughf sfress? 

2.4* A soil scienfist sampled two fields fo gef background measuremenfs of carbon 
biomass (measured as mg C per kg of soil) prior fo a field experimenf. Six sam- 
ples were taken from each field: fhe samples from fhe firsf field gave 910, 1058, 
929, 1103, 1056, 1022 mg C kg“k fhe samples from fhe second field gave 1255, 
1121, 1111, 1192, 1074, 1415 mg C kg“F File carbon.dat confains the unit number 
(Sample), field number (factor Field) and carbon biomass measurement (vari- 
ate Carbon) for each sample. Use a fwo-sided fwo-sample f-fest fo tesf whether 
there is any difference in average biomass befween the two fields, and calculafe 
a 95% Cl for fhe difference. 

2.5 In Example 12.1 (Tables 12.1 and A.l), we describe an experimenf in which sev- 
eral morphological fraifs were measured on 190 seeds from a line of diploid 
wheaf. Two of fhe fraifs measured on each seed were lengfh (mm) and weighf 
(mg). The unif numbers (DSeed) and lengfh and weighf measuremenfs (vari- 
afes Length and Weight) can be found in file triticum.dat. Produce a scaffer 
plof of fhese fwo fraifs and calculafe fhe unbiased sample variances and covari- 
ances befween fhem. Derive fheir sample correlafion coefficient r. Is fhere evi- 
dence of associafion befween fhese fwo variables? 
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This chapter presents the basic concepts that are required to construct designs to address 
directly and efficiently the aims of a biological experimenf. We firsf discuss fhe choice of 
freafmenfs and maferials; freafmenfs should be defermined by fhe aims and fhe maferi- 
als should be chosen according fo fhe frame of reference for fhe experimenf (Secfion 3.1). 
These fwo componenfs musf fhen be combined fo produce an appropriafe design. A good 
design fakes proper considerafion of fhree sfafisfical principles: replicafion (Secfion 3.1.1), 
randomizafion (Secfion 3.1.2) and blocking (Secfion 3.1.3), fo reduce bias and maximize fhe 
precision of freafmenf comparisons. We describe fhe sfrucfure of a design wifh respecf fo 
underlying factors using a symbolic form (Secfion 3.2). Many experimenfs will use one 
of fhe wide and flexible family of sfandard designs, such as fhe completely randomized 
design (CRD) (Secfion 3.3.1), fhe randomized complefe block design (RCBD) (Secfion 3.3.2), 
fhe Lafin square (LS) design (Secfion 3.3.3), fhe splif-plof (SP) design (Secfion 3.3.4) or fhe 
balanced incomplefe block design (BIBD) (Secfion 3.3.5). Once fhe sfrucfure of fhe design 
has been defermined, a properly randomized layouf can be generafed wifh sfafisfical soff- 
ware (Secfion 3.3.6). 



3.1 Key Principles 

As described in Chapter 1, an experimenfal sfudy invesfigafes fhe relafionship befween 
an oufcome and one or more condifions fhaf are manipulated by fhe researcher. Before 
considering fhe appropriafe design for any experimenf, if is imporfanf fo be clear abouf ifs 
aims, which are usually associated wifh one or more scienfific quesfions or hypofheses fo 
be fesfed. Examples of such quesfions mighf be 

• Is any reducfion in disease infecfion achieved wifh a new 'resisfanf' variefy com- 
pared wifh a sfandard 'confrol' variefy? 

• How do planf mefabolifes respond fo increasing droughf sfress af differenf sfages 
of developmenf? 

• Which chemicals, of several under sfudy, show insecficidal acfivify? 

• How is yield related fo planf spacing, and does fhis relafionship vary befween 
variefies? 

The aims of fhe experimenf should be well defined fo make if easy fo assess whefher 
fhe chosen freafmenfs are sufficienf fo achieve fhem. In fhis confexf, fhe ferm treat- 
ments is used to describe the set of differenf experimenfal condifions fo be fesfed, 
for example, variefies, nifrogen rafes, or chemical compounds, or, more usually, com- 
binafions of several such classifying variables. Confrol freafmenfs - eifher posifive or 
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negative controls - may be used to provide a baseline, or to verify that the experiment 
has worked as expected. A negative control usually corresponds to a 'null' treatment, 
and a positive control usually corresponds to a standard treatment with a known effect. 
This is discussed further in Section 8.5. In addition to defining fhe experimenfal freaf- 
menfs, fhe experimenfal unifs musf be chosen. The experimental unit for a freafmenf 
is defined as 'fhe smallesf division of fhe experimenfal maferial such fhaf any fwo unifs 
may receive differenf freafmenfs in fhe acfual experimenf' (Cox, 1992). For some freaf- 
menfs, fhis may be larger fhan fhe size of unif on which individual observafions are 
recorded (somefimes called fhe observational or measurement unit), and may occur at 
a range of scales, as in fhe following examples. 

• An area of land on a farm. A field frial fypically has numerous small plofs, and 
experimenfal freafmenfs are applied fo fhe individual plofs. The experimenfal 
unif is fhe plof, and fhe measuremenf unif may be eifher fhe plof (e.g. yield) or 
sub-samples from fhe plof area (e.g. individual plan! measuremenfs). 

• Individual soil samples taken from afield. In fhe confexf of a field frial wifh freafmenfs 
applied fo plofs, if a single soil sample is faken from each field plof for process- 
ing in fhe lab, fhen fhe soil sample becomes fhe experimenfal unif. If mulfiple 
soil samples are faken from each plof, fhen fhe experimenfal unifs are fhe sefs of 
samples from each plof. 

• Pots, each containing three plants. If experimenfal freafmenfs, such as soil nufrienf 
confenf, are applied fo whole pofs, fhen fhe pof is fhe experimenfal unif. The mea- 
suremenf unif may be eifher fhe whole pof (e.g. combined biomass) or individual 
planfs. 

• Different leaves from an individual plant. In fhe invesfigafion of the response of planfs 
fo aphid affack, clip cages wifh or wifhouf aphids mighf be affached fo individual 
leaves wifhin a planf. The experimenfal unif is fhen fhe individual leaf. 

• Samples ofRNA extracted from different plants. Invesfigafion of gene expression offen 
involves fhe applicafion of differenf freafmenfs fo individual planfs, followed by 
exf racf ion of RNA from each planf. The experimenfal unif for f urfher sf udy is fhen 
fhe RNA faken from an individual planf. 

• A batch of 10 insects in a Petri dish. Experimenfs on small insecfs are offen done on 
groups of insecfs kepf fogefher in dishes (or cages), wifh freafmenfs applied fo 
fhe dishes. The experimenfal unif is fhen fhe dish. The measuremenf unif may 
be fhe dish, via a summary of insecf behaviour such as percenf survival, or fhe 
individual insecfs. 

Recall from Secfion 1.2 fhaf fhe experimenfal unifs are considered fo consisf of a sam- 
ple from a wider populafion for which inferences can be made, and fhaf fhis populafion 
should be idenfified according fo fhe frame of reference for fhe experimenf. For example, 
if RNA samples for microarray work are faken from only a single planf, fhen conclusions 
regarding gene expression in fhe planf populafion cannof safely be made wifhouf furfher 
experimenfafion, because variafion befween differenf planfs would be expecfed. If sam- 
ples are faken from several randomly selecfed planfs, fhen variafion befween planfs can be 
accounfed for, and inferences can be applied fo fhe wider populafion. A similar sifuafion 
occurs when an experimenf is esfablished in a single sife, or in a single year, or on a single 
variefy, as fhere can be no cerfainfy fhaf resulf s can be safely exf rapolafed fo wider circum- 
sfances. This issue is especially relevanf fo field frials, where fhe chance peculiarifies of a 
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single environment can produce anomalous results; for this reason, many journals will not 
publish the results of field frials fhaf have nof been repealed over several sifes or seasons or 
bofh. If is fherefore imporfanf fo recognize fhe frame of reference implied by fhe choice of 
experimenfal unifs, so fhaf appropriafe conclusions can be drawn from fhe resulfs. 

Alfhough we wish fo have a represenfafive sample of experimenfal unifs, we can gef 
more precise esfimafes of differences befween freafmenfs by making comparisons across 
similar unifs. We can deal wifh fhis apparenf confradicfion by using sefs of reasonably 
homogeneous experimenfal unifs, and fhen repealing fhe comparison across a wider range 
of circumsfances. In some cases, fhe experimenfal unifs may have some infrinsic sfrucfure, 
inf reducing some heferogeneify befween groups of more homogeneous unifs. This sfruc- 
fure should be incorporafed in bofh fhe choice of experimenfal unifs for freafmenf applica- 
fion and in fhe sfafisfical analysis. For example, experimenfal maferials may be arranged 
as planfs wifhin pofs wifhin frays, giving fhree sfrucfural levels, and we expecf differenl 
levels of variafion wifhin each of fhese levels. Depending on pracfical considerafions and 
fhe aims of fhe experiment if may be appropriafe fo apply freafmenfs af any of fhese levels 
and a sfafisfical analysis should accounf for fhis sfrucfure. 

The choice of experimenfal unifs and freafmenfs should be made separafely: unifs are 
chosen according fo fhe appropriafe frame of reference for fhe experiment and freafmenfs 
are chosen fo enable fhe hypofheses fo be fesfed. A good design fhen mafches fhe freaf- 
menfs wifh fhe unifs so fhaf fhe freafmenf differences can be esfimafed wifhouf bias (i.e. 
wifhouf sysfemafic over- or under-esfimafion) and as precisely as possible (i.e. fo minimize 
uncerfainfy in fhe resulfs). Our main fool fo avoid experimenfal bias is randomizafion, i.e. 
fhe random allocafion of freafmenfs fo experimenfal unifs, and experimenfal precision 
can be improved by fhe use of proper replicafion and blocking (ferms we discuss in more 
defail below). 

First it is helpful to recall the role of the underlying unit-to-unit variation in biological 
experimentation. It is well known that biological individuals vary in any given character- 
istic or response. The amount of variation may depend on several factors, such as differing 
genetic backgrounds and environmental effects, but some variation is always present. This 
natural variation may be inflated by uncertainty introduced by the measurement process 
in cases where exact measurement is not possible (also known as measurement error). 
This combined background variation is a potential cause of both bias and uncertainty in 
experimental results. For example, if two treatments are each applied to one plant only, it 
is not possible to assess whether any difference in the measured response is due to treat- 
ment differences or natural plant-to-plant variation. Statistical design and analysis aim to 
distinguish, quantify and, subsequently, compare variation between treatments (signal) 
with background variation (noise). A large signaknoise ratio indicates that substantive 
treatment differences are present. A small signaknoise ratio indicates that any apparent 
treatment differences could be explained by the background variation in the system, and 
therefore cannot confidently be attributed to treatment effects. Proper identification and 
control of background variation is thus an essential aim of any statistical design. 

A good design considers each of the three basic principles: replication, randomization 
and blocking. Replication is the process of applying each treatment to more than one 
experimental unit, so the number of replicates of a treatment is the number of independent 
experimental units to which each treatment is applied. Randomization means the random 
allocation of treatments to experimental units and is used to ensure the fair assessment of 
treatments without bias. For this reason, it can be regarded as an insurance against poten- 
tial unknown differences between units, and it should be used whenever possible. In some 
circumstances, it may be possible to identify or construct groups of experimental units 
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expected to have similar responses in the absence of any treatment effects. This process is 
known as blocking. A block is a subset of fhe experimenfal maferial wifhin which experi- 
menfal unifs are expecfed fo be homogeneous, wifh more heferogeneify expecfed befween 
experimenfal unifs in differenf blocks. In fhe analysis of experimenfal dafa, variafion due 
fo blocks can be separafed from background variafion and, if fhere are differences among 
blocks, fhis separafion will increase fhe precision of freafmenf comparisons by reducing 
fhe esfimafe of fhe unif-fo-unif background variabilify. An appropriafe use of fhese fhree 
design principles will give confidence fhaf any freafmenf differences observed are real and 
nof due fo some chance combinafion of circumsfances, and will also enable fhe maximum 
amounf of informafion fo be obfained from fhe available resources. We now discuss each 
of fhese principles in more defail. 

3.1.1 Replication 

The nafural background variafion among experimenfal unifs means fhaf if is necessary fo 
replicafe fhe applicafion of each freafmenf fo several experimenfal unifs. This replicafion 
serves fwo imporfanf purposes. Firsf, by repeafing each freafmenf on several experimenfal 
unifs, we gef a more reliable esfimafe of fhe effecf of each freafmenf. Second, and possibly 
more imporfanfly fhe replicafed observafions provide an esfimafe of fhe background vari- 
afion befween unifs, which we can use fo assess whefher freafmenfs differ and fo indicafe 
fhe precision associafed wifh fhe esfimafed freafmenf effecf. Usually each freafmenf will 
be replicafed an equal number of fimes, buf in circumsfances where parficular freafmenfs 
are of greafer inferesf, if may be advanfageous fo have increased replicafion for fhose freaf- 
menfs. Conversely, reduced replicafion may be used where resources for parficular freaf- 
menfs are eifher scarce or expensive, for example, seed for a new breeding line. 

To illusfrafe some issues regarding replicafion, consider an experimenf fo compare fwo 
pesficide freafmenfs (a sfandard and a new formulafion) applied fo six insecf-nef cages, 
each cage confaining 10 aphids. The new formulafion is applied fo fhree cages selecfed 
af random, wifh fhe sfandard formulafion applied fo fhe remaining fhree cages. The rep- 
licafion of each freafmenf in fhis experimenf is only fhree, even fhough 30 aphids have 
been freafed wifh each pesficide and even if fhe measuremenf unif is fhe individual aphid, 
because each freafmenf is applied fo a cage of aphids and so fhis is fhe experimenfal unif. 
The individual aphids here are an example of pseudo-replicafes. Pseudo-replication 
describes the situation in which multiple measurements are taken from each experimenfal 
unif. This can be a very useful experimenfal fechnique, buf musf be properly incorpo- 
rafed info any sfafisfical analysis, which may ofherwise produce an incorrecf esfimafe 
of fhe befween-unif variabilify (usually too small), possibly leading fo incorrecf conclu- 
sions abouf fhe imporfance of freafmenf effecfs. Pseudo-replicafion usually causes prob- 
lems when fhe smallesf level of experimenfal maferial (i.e. fhe measuremenf unif, here, 
fhe aphid) is wrongly idenfified as fhe experimenfal unif in a sfafisfical analysis, in place 
of fhe level af which freafmenfs were acfually applied (here, fhe cage). As a rule of fhumb, 
replicafion needs fo occur af fhe level af which fhe freafmenfs have been applied fo be con- 
sidered 'real'. Consider fhe following examples of designs for experimenfs. 

• Twelve pofs, each confaining four planfs af fhe fhree-leaf sfage, wifh six freaf- 
menfs (A-F) each applied fo fwo of fhe pofs wifh fhe allocafion made af random 
(Figure 3.1). Freafmenfs were applied fo pofs, and so fhe pof is fhe experimenfal 
unif and fhe replicafion of each freafmenf is fwo. Measuremenfs from individual 
planfs are pseudo-replicafes. 
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FIGURE 3.1 

Design for an experiment with four plants (•) in each of 12 pots with treatments (A-F) applied to pots. 
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FIGURE 3.2 

A two-stage design. The first stage (left) is a field trial with 12 plots and two treatments (A and B). A soil sample 
is taken from each plot and labelled by its treatment and replicate (Aj-A^ and Bj-BJ. At the second stage, sam- 
ples from each treatment are mixed together (bulked) then sub-sampled. Sub-samples from each bulked sample 
are labelled by lower-case letters and sample number and then measured. 

• A field experiment consisting of 12 plots, with two treatments (A and B) each applied 
to six of the plots selected at random (Figure 3.2). One soil sample was taken per plot, 
and labelled by the treatment and replicate number, giving samples Aj . . . Ag and Bj 
. . . Bg. The soil samples from the six replicate plots for each treatment (e.g. Aj . . . Ag) 
were bulked (combined) together and mixed thoroughly and six sub-samples taken 
and measured. These sub-samples were labelled as aj . . . ag and bj . . . bg. In this case, 
although treatments were originally applied to plots, at the analysis stage there is 
only a single replicate for each treatment because samples from independent plots 
have been bulked and the sub-samples are not independent. The sub-samples are 
pseudo-replicates and give information on the homogeneity of the bulked sample 
rather than on the variation between plots. Ideally, samples from each plot should 
have been kept separate, giving six true replicates for each treatment. 

• Two controlled environment (CE) cabinets, one at 10°C and one at 20°C, each 
containing eight seed trays, with two different watering regimes (A and B) each 
applied to four trays chosen at random within each cabinet (Figure 3.3). Both 
temperature and watering regime are considered as treatments here. The experi- 
mental units for watering regime are the seed trays and each watering regime is 
applied to eight independent seed trays, giving replication of eight. The experi- 
mental units for temperature are the cabinets, and the temperature treatments are 
unreplicated. To achieve replication of temperature, it would be necessary to use 
another two cabinets, or to repeat the study under the same controlled conditions 
with a new randomization of both factors. 
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FIGURE 3.3 

Design for an experiment with eight trays, with two watering regimes (A and B) applied within each of two CE 
rooms, with each room operating at a different temperature (10“C or 20°C). 



The last of these examples shows that experimental units can occur at several different 
levels within a structure, and may differ between treatment factors. Therefore, one level 
of sfructure may represent pseudo-replicates for one type of treatment and real replicates 
for another. 

It is also important to draw the distinction between technical and biological replication. 
Technical replication occurs when several measurements are taken from the same biolog- 
ical material, while biological replication occurs when measurements are taken from sev- 
eral independent biological subjects. The use of adequate biological replication is required 
to make inferences valid for the population from which the samples were obtained, rather 
than for a single individual. Technical replication is always pseudo-replication, but biolog- 
ical replication may correspond to either pseudo-replication or true replication, depending 
on the context. It is clearly important to recognize when measurements are pseudo-repli- 
cates that do not increase treatment replication. Note however, that technical replication 
can be useful in increasing precision where measurement is seriously subject to error, 
provided that it is properly accounted for in the statistical analysis. We give an example of 
an analysis accounting for pseudo-replication in Section 7.5. 



3.1.2 Randomization 

Randomization is required to ensure the fair allocation of treatments to units to guard 
against bias, and to cope with the natural variation between experimental units. In the 
simplest case, randomization requires that each permutation of the set of treatments has 
an equal probability of occurring, so that (for equal replication) every experimental unit 
has an equal chance of receiving any treatment. Hence, each treatment is equally likely to 
be applied to 'good' units as to 'bad' units. Where randomization has not taken place, there 
will always be a question about possible bias in the experiment. 

To obtain a proper randomization for a given design, a method is required for assigning 
treatments to experimental units at random. We use the convention that treatments get 
assigned to experimental units, though the opposite approach can also be used. So, for 
example, for an experiment comprising two treatments each replicated six times, we might 
write A and B on six pieces of paper each, to represent the six replicates of each treatment, 
and put these in a bag and draw (without replacement and without looking) to obtain a 
sequence which allocates treatments to units 1 ... 12. Alternative approaches could use 
six tokens of each of two different colours, or six playing cards from each of two different 
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suits, in the same way. Random number tables can also be used for allocating treatments 
to units, though care is needed to define fhe profocol where foo many repeafs of a freaf- 
menf occur in fhe random sequence. However, mosf randomizations are now done via 
sfafisfical soffware. The mechanics of fhe process are unimporfanf as long as fhe properfy 
of equal probabilify for each permufafion of freafmenfs is preserved - fhis is discussed 
furfher in Example 3.1. Randomization ensures fhaf any (possibly unconscious) bias of fhe 
experimenfer (e.g. a fendency fo assign fhe biggesf planfs fo fheir favoured freafmenf) is 
avoided and fhaf any unknown differences between the units are unlikely to consistently 
favour particular freafmenfs. 

To reinforce objecfivify in some areas of research, parficularly medical research, frials 
are carried ouf as eifher single- or double-blind frials. In a single-blind frial, fhe subjecf 
does nof know which freafmenf has been allocafed, while in a double-blind frial, neifher 
fhe subjecf nor fhe invesfigafor knows which freafmenf has been applied. In plan! science, 
fhe percepfion of fhe subjecf is nof considered relevanf. However, fhe percepfion of fhe 
invesfigafor measuring or assessing fhe experimenfal maferial could be influenced (pos- 
sibly unconsciously) by fheir expecfafion of fhe applied freafmenf. If is fherefore good 
pracfice fo make experimenfal measuremenfs (especially subjective assessmenfs) wifhouf 
knowledge of fhe freafmenf allocafion as far as possible. For example, in field frials, fhis 
can somefimes be achieved by fhe use of a field plan wifh plof numbers marked buf nof 
freafmenfs. 

Randomizafion leads fo esfimafes of freafmenf differences fhaf are unbiased when 
considered across fhe whole sef of possible randomizations. However, fhis does nof 
guaranfee fhaf any individual randomizafion will produce unbiased resulfs. For exam- 
ple, all insfances of one freafmenf may be assigned fo larger planfs by chance. For fhis 
reason, experimenfal unifs should be chosen fo be as homogeneous as possible while 
still being represenfafive of fhe populafion of inferesf. Selecfing homogeneous unifs has 
fhe added advanfage of reducing fhe background variafion or noise. Where if is nof pos- 
sible fo selecf a complefely homogenous sef of experimenfal unifs, fhe unifs need fo be 
grouped info sefs (blocks) of more homogeneous experimenfal unifs fo avoid pofenfial 
bias (see Secfion 3.1.3). Somefimes, even where fhe unifs are fhoughf fo be homogeneous, 
a randomizafion can give cause for concern. For example, for 12 pofs arranged in a line 
wifh fwo freafmenfs (labelled A and B) each replicafed six fimes, consider fhe following 
randomizafion 



AAAAAABBBBBB. 

This parficular randomizafion does nof look random, buf can occur (wifh probabilify 1 in 
924). If we are nof happy fo accepf fhis randomizafion, if is probably an indication fhaf we 
do nof consider fhe experimenfal unifs fo be complefely homogeneous, so fhaf some sorf 
of blocking is needed. 

EXAMPLE 3.1: RANDOMIZATION 

The efficacy of a new pesticide is to be tested in the field with 15 plots of size 5 m x 10 m 
arranged in a 3 x 5 array. Five plots will be sprayed with the pesticide and 10 will be 
untreated (controls) for comparison. In this case, extra replication of the control treat- 
ment is used to obtain a good estimate of background variability and because the new 
pesticide is available in only small amounts. We evaluate two methods of determining 
a randomization for this experiment and consider whether each of the methods gives a 
valid randomization, i.e. with equal probability for each permutation of treatment effects. 
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TABLE 3.1 

Experimental Layout Achieved Using Randomization by 
Playing Cards for a Field Trial with 15 Plots (Numbered 
1-15) and Two Treatments: A Pesticide Treatment (Labelled 



P) with Five Replicates and a Control Treatment (Labelled 
C) with 10 Replicates (Example 3.1) 
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First, we use a pack of cards. We might take 15 cards: five red cards to represent the 
pesticide treatment and 10 black cards to represent the control. We shuffle the cards 
(randomization), then deal them out in shuffled order to allocate treatments to plots (in 
order 1-15) to get, for example, the randomization shown in Table 3.1. 

Is this a valid randomization? Let us consider the process. Assuming a fair shuffle, 
when we pick the first card we have a probability of 5/15 of picking the pesticide treat- 
ment for the first plot (and 10/15 of picking the control). These probabilities change as 
the allocation proceeds. For example, if the first plot is allocated the pesticide treatment, 
when we pick the second card we have a probability of 4/14 of the second plot also being 
allocated the pesticide treatment (as we have four red cards out of 14 left), and so on. With 
this method, the probability of plots 1-5 all being allocated the pesticide treatment is 
1/3003 (= 5/15 X 4/14 x 3/13 x 2/12 x 1/11). This is the same, for example, as the probabil- 
ity of plots 11-15 all being allocated the pesticide treatment (equivalent to the probability 
of plots 1-10 being allocated the control, i.e. 10/15 x 9/14 x ... x 2/7 x 1/6 = 1/3003). There 
are, in fact, 3003 possible permutations of five pesticide-treated plots and 10 control plots, 
calculated as the factorial function of 15 (the number of plots) divided by the product of 
the factorial function of five (the number of pesticide-treated plots) and the factorial func- 
tion of 10 (the number of control plots): 

15! _ 15xl4xl3x...x2xl 

51x10! “ (5 X 4 X 3 X 2 X 1) X (10 X 9 X ... X 2 X 1) 

_ 15 X 14 X 13 X 12 X 11 
5x4x3x2xl 
= 3003 . 

With this randomization approach, each of these permutations is equally likely. 

An alternative, and perhaps at first sight simpler, approach is to toss a coin to construct 
the randomization, with heads corresponding to the pesticide treatment, and tails cor- 
responding to the control. Working through the plots one by one, we toss the coin once 
for each plot, allocating the pesticide treatment to the plot if it comes up heads (subject 
to a maximum of five pesticide plots), and allocating the control treatment to the plot if it 
comes up tails (subject to a maximum of 10 control plots). Is this a valid randomization? 
Let us again consider the process. We have a probability of 1/2 (assuming a fair coin) of 
allocating pesticide to the first plot. The second coin toss takes no account of the alloca- 
tion for the first plot, so again we have probability 1/2 of pesticide being allocated to the 
second plot, and so on. With this method, the probability of plots 1-5 being allocated 
the pesticide treatment is 1/32 (= 1/2 x 1/2 x 1/2 x 1/2 x 1/2). In contrast, the probability 
of plots 11-15 being allocated the pesticide treatment (i.e. plots 1-10 being allocated the 
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control) is (l/2)i° (= 1/1024). The probabilities of these different permutations are obvi- 
ously not the same, and the probability of any particular permutation depends on how 
we number the plots! 

It is clear that the two processes are not equivalent. The 'coin tossing' approach gives 
an invalid randomization, with different permutations having different probabilities. 

By contrast, the 'card shuffling' approach associates the same probability with each per- 
mutation, and therefore provides a valid randomizafion. This example illustrates some 
of the issues that must be considered when you derive a randomization scheme, and 
that are automatically accounted for by statistical software. 

Some other examples of randomization are presented in Figures 3.1 to 3.3 and in Section 
3.3. Note that if an experiment is repeated, then a new randomization should be generated 
each time the design is used; it is not statistically valid (or sensible) to generate a single 
randomization and then to use it repeatedly. 

3.1.3 Blocking 

It is desirable for the set of experimental units that are used to compare treatments to 
be reasonably uniform (homogeneous) in their natural response, as this decreases our 
estimate of fhe background variation, thus increasing precision and the potential for fhe 
experiment to detect small treatment differences. So, if fhe experimental units are intrinsi- 
cally diverse (heterogeneous), then the experiment is likely to be insensitive. Further, as 
noted in Section 3.1.2, using a set of homogeneous units increases the chances of a fair 
comparison befween treatments. However, it is not always possible to obtain a sufficient 
number of homogeneous experimental units for a whole experiment and, even if it is pos- 
sible, it might not be desirable if it means restricting the frame of reference for the experi- 
ment. In such cases, it might be possible to identify groups of experimental units such that 
the units within each group are reasonably homogeneous, but with different underlying 
responses between groups. These groups of units can then be considered as blocks within 
the design. Blocking the units in this way potentially increases the precision of an experi- 
ment, as comparisons between treatments within blocks are made against a more uniform 
background. In this sense, blocking is said to be used for fhe control of variation, and for 
fhis reason is also known as local control. 

The term block originated in agricultural experiments, where a block corresponded to a 
set of contiguous field plots; however, the specification of blocking can take more general 
forms, including the recognition of any physical structure present in the experiment. We 
often use the term 'block' as synonymous with 'structure'. Blocks may therefore be defined 
according to proximity of units in space (e.g. neighbouring plots), proximity of units in 
time (e.g. units measured in the same day or hour), units with similar physical character- 
istics (e.g. size of planf, age of insect), or logistical factors (e.g. machine, technician). Note 
that the number of unifs per block should ideally be determined by consideration of fhe 
uniformity or structure of experimental units and not by what is convenient in relation to 
the number of experimenfal treatmenfs. 

Consider the following examples of types and causes of heterogeneity among experi- 
mental units, which can be addressed by the use of blocking. 

• Field characteristics. A slope, or fertility or pH trend across a field, or local pest prob- 
lems (e.g. pigeons next to woodland) may be present. Blocks are usually formed 
from sefs of contiguous plots that are expected to be similar in as many respects 
as possible. Occasionally, blocks may be formed from non-contiguous plofs with 



52 



Statistical Methods in Biology 



similar properties, for example, soil pH, but, in such cases, the other spatial char- 
acteristics need to be reasonably homogeneous. 

• Glasshouse characteristics. Differential shade or temperature due to positioning 
with respect to walls and doors are common in glasshouses. Blocks are usually 
formed from sets of trays or pots placed close together and hence in similar envi- 
ronmental conditions. 

• Time of measurement. Some experiments may be processed over a lengthy period, 
and time of measurement may have a systematic effect on results. In the laboratory, 
there may be a limit to the number of samples that can be measured in one batch, 
and equipment may give slightly different readings on different days. In either case, 
a set of units processed within the same time period can be considered as a block. 

• Investigator. For subjective measures, such as visual scores, individuals often per- 
ceive pre-determined scores differently. However, even in more objective situa- 
tions, for example, an investigator following a standard protocol, the use of subtly 
different procedures can lead to systematic differences in results. If several differ- 
ent investigators are scoring material or carrying out a laboratory process, then it 
makes sense to regard each person as a block. 

• Batches of chemical, of plants, or of other organisms such as insects. Again, if there is 
any possibility of (even small) differences between batches, then batches should 
be considered as blocks. 

• General structure. There will often be a natural structure in experimental mate- 
rial. For example, trays of plants may be held on shelves within a CE cabinet. 
Conditions are more similar for plants within the same tray, for trays on the same 
shelf, and for shelves within the same cabinet (in the case of several cabinets), so 
all of these levels of structure should be considered as possible blocking factors. 

In each of the examples above, information on the causes of heterogeneity is used to define 
blocks of reasonably homogeneous units and treatments can then be assigned at random 
to units within blocks. Note that each block might not be able to contain the full set of 
treatments (see Section 3.3.5), and that all blocks might not even contain the same numbers 
of experimental units. The randomization process needs to take account of the structure 
of the blocks, so that each treatment has the same probability of being applied to any unit 
within each block. If there are large differences between blocks, this also ensures a fairer 
allocation of treatments to units, as each treatment will occur within several (often all) 
blocks. For this reason, blocking can be seen as a set of restrictions on the randomiza- 
tion of treatments to the experimental units. We consider this in more detail for specific 
designs later (Section 3.3). Note that although blocking is generally intended to increase 
the precision of treatment comparisons where groups of heterogeneous units are present, 
the precision may decrease if too much blocking is used where there is no heterogeneity. 



3.2 Forms of Experimental Structure 

To successfully design, and later analyse, an experiment, it is necessary to identify all com- 
ponents of the experiment, i.e. both the treatments imposed and the structure of the units. 
In Section 1.3, we partitioned the systematic part of our mathematical model into two 
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components: the explanatory component describes the treatments present and the struc- 
tural component describes the blocking, or other structure, of the experimental units. We 
describe both components using factors which label the different groups present. Often, 
several factors are required to describe each component fully. For example, in a CE experi- 
menf where frays were placed on shelves wifhin cabinefs, we need facfors fo label each of 
fhe cabinefs, shelves and frays fo fully describe fhe sfrucfure. Similarly, a sef of experimen- 
fal freafmenfs may be consfrucfed from an underlying sef of freafmenf facfors. A factorial 
freafmenf sfrucfure consisfs of all possible experimenfal freafmenfs consfrucfed by faking 
one level from each of a sef of freafmenf facfors; fhis gives a parficularly efficienf form of 
design and is discussed furfher in Secfions 8.2 and 8.3. 

We wrife our model componenfs using a symbolic nofafion similar fo fhaf commonly 
used in sfafisfical soffware. To use fhis nofafion effecfively, we firsf need fo undersfand 
fwo differenf fypes of relafionships befween facfors, namely nesfed and crossed sfrucf ures. 

Nested structures are used to describe hierarchical relationships. These most often 
occur within the structural component, but also occasionally within the explanatory 
component (e.g. see Section 8.4). A nested structure describes the situation where multi- 
ple units at one structural level are entirely contained within units at a higher level, and 
there is no direct relationship between units with the same label at the lower level. For 
example, consider an experiment with different treatments to be applied to four leaves 
(facfor Leaf, wifh four levels) wifhin each of 10 planfs (factor Plant, wifh 10 levels). Leaves 
wifhin planfs are fhe experimenfal unifs, and we consider fhe Leaf factor to be nested 
within the Plant factor, written symbolically as Plant/Leaf. In this hierarchical structure, 
there is no association between leaves with the same label across plants, for example 
there is no association between leaf 1 on plant 1 and leaf 1 on plant 2. The / (forward 
slash) operator is used to indicate a nested relationship. In fact, this operator generates 
two separate model terms, as 



Plant/Leaf = Plant -i- Plant. Leaf 

The first term consists of the Plant factor alone, and labels each of the 10 individual plants. 
In the second term, the . (dot) operator generates all combinations of levels of the two fac- 
tors, in this case labelling the 40 individual leaves. These two terms label the units within 
the two levels of the design. 

Crossed structures occur when two factors are used to classify experimental units 
both independently and simultaneously. This type of structure occurs frequently within 
both the explanatory and structural components of the model. For example, consider 
a laboratory experiment to examine an extraction procedure in which three different 
filtering methods (factor Filter, with three levels) are tested with four different reagents 
(factor Reagent, with four levels), giving 12 experimental treatments in total. Both fac- 
tors act simultaneously, and the crossed structure can be written as Fllter*Reagent. In 
a crossed structure, there is an association between units with the same level of either 
factor. The (star) operator indicates a crossed relationship and again generates several 
model terms, as 



Filter*Reagent = Filter -i- Reagent -i- Fllter.Reagent 

The first two terms are the individual factors, and the third term labels all combinations 
of the two factors, here the 12 individual treatments. The interpretation of these terms is 
discussed further in Section 8.2. 
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In the structural component of the model, the terms generated describe different levels 
of fhe design af which variafion may occur; fhese differenf levels are known as strata. For 
example, the crossed structure of a recfangular layouf, Row*Column, generafes fhree sfrafa 
(Row, Column and Row.Column) and fhe nesfed sfrucfure Plant/Leaf generafes fwo sfrafa 
(Plant and Plant. Leaf). 

In general, either of the model components may contain nested or crossed relationships or 
both. Examples 3.2 and 3.3 describe some specific situations in the context of the structural 
component; in Chapter 8, we consider examples in the context of the explanatory component. 

EXAMPLE 3.2: NESTED STRUCTURAL FACTORS 

An experiment is set up with two identical CE rooms with three trays, each containing 
six pots, within each room (see Figure 3.4), with the potential to allocate different treat- 
ments at both the tray and pot levels. 

The CE rooms can be considered as the highest level of structure, with trays within 
rooms as the middle level, and pots within trays as the lowest level. This nested struc- 
ture thus has multiple units at any one level (e.g. trays) completely contained within 
each unit at the level above (e.g. rooms). We can verify that this is a nested structure by 
noting that there is no association between tray 1 in room 1 and tray 1 in room 2, and 
similarly no association between pots with the same label in different trays. The struc- 
tural factors can be denoted as Room (two levels). Tray (trays labelled within rooms, 
three levels) and Pot (pots labelled within trays, six levels) and we can write this as 

Structural component: Room/Tray/Pot 

which can be expanded as 

Structural component: Room + Room.Tray + Room.Tray.Pot 

The three terms label the three strata, or levels of the hierarchy: the individual rooms 
(term Room), the individual trays (term Room.Tray) and the individual pots (term 
Room.Tray.Pot). The appropriate experimental unit for the application of any treatment 
must then be decided as a separate exercise. 




FIGURE 3.4 

A nested structure with six pots per tray and three trays per CE room (Example 3.2). 
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EXAMPLE 3.3: CROSSED AND NESTED STRUCTURAL EACTORS 

Consider an experiment where a large set of plant samples are to be processed by a 
machine. There are two machines that could be used (factor Machine, with two levels) 
and two scientists available to do the work (factor Scientist, also with two levels). We 
might want to allow for potential differences in results both between scientists and 
between machines. If we want to separate the effects of the different scientists from 
the effects of the two machines, then both scientists must use both machines. So an 
appropriate design might allocate four sets of samples to be processed by the four 
machine-by-scientisf combinations. Each of fhe machine-by-scientisf combinations can 
be considered as a block (see Figure 3.5). 

Because there may be an association between samples processed either by the same 
machine or by the same scientist, this is a crossed relationship. The structure of the four 
blocks can then be written with our symbolic notation as 

Machine*Scientist 



which can be expanded as 

Machine + Scientist + Machine. Scientist 

These three terms, or strata, describe an overall effect for each machine, an overall effect 
for each scientist, and a combined effect for each machine-by-scientist combination. 
There is no association across samples processed by different machine-by-scientist 
combinations, and so samples can be considered to be nested within these blocks. The 
full strucfure can thus be written as 

Structural component: (Machine*Scientist)/Sample 

and expanded as 

Structural component: Machine -t Scientist -t Machine.Scientist 
-t Machine. Scientist. Sample 



Machine 1 Machine 2 




FIGURE 3.5 

A crossed and nested structure with three samples within each of four machine-by-scientist combinations 
(Example 3.3). 
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So the full structure has four strata: the three described above, plus a fourth that labels 
the full sef of individual samples. One advantage of using the crossed structure at the 
higher level is that it allows us to establish whether differences befween scientists and 
machines are present, and to estimate their relative size and potential impact on the 
results. This information can be useful in designing future experiments although, in 
practice, we would require several repeats of fhis strucfure fo get reliable estimates of 
the different sources of variafion. An alfernafive strucfure for this experiment might 
combine (or, in statistical terminology, confound) fhe scientisf and machine effecfs, so 
fhat each scientist uses just one of fhe two machines. In this case, the effects of fhe fwo 
scientisfs and the effects of fhe fwo machines would nof be separable. Whefher this is 
important depends on the information required from the experiment: this confounded 
blocking still achieves the main aim of separafing block variafion from the background 
variation, but does not allow us to compare the relative influence of fhe scientisfs and 
machines used in the process. 

The symbolic notation for model formulae fhaf we have used here was infroduced by 
Wilkinson and Rogers (1973) and is used wifhin GenSfaf. Unforfunafely, convenfions for 
model specificafion differ somewhaf befween sfafisfical packages. For example, fhe R 
package uses fhe : (colon) operator where we have used fhe . (dof) operator. Defails can be 
found by consulfing soffware documenfafion. 

To define a model fully, we need fo specify in full bofh fhe explanatory and sfrucfural 
componenfs. For analysis, we also need fo define fhe response variable fo be analysed. 
In fhe examples above, and fhroughouf fhis book, we have included fhe individual unifs 
wifhin fhe sfrucfural model. This is nof sfricfly necessary, as fhe individual unifs obviously 
correspond fo individual observafions. Flowever, we believe fhaf refenfion of informafion 
on fhe full design sfrucfure as well as fhe freafmenf factors from an experimenf is good 
pracfice. For example, unless we refain full informafion on fhe design, we cannof use all of 
fhe diagnosfic procedures described in Chapfer 5; we cannof plof residuals on fhe experi- 
menfal layouf if we do nof know where each unif was placed. If a few pofs in a corner of 
a glasshouse behave differenfly fo fhe resf, fhe cause may be obvious when observafions 
are ploffed on fhe experimenfal layouf buf nof when examined by freafmenf classificafion. 
Unforfunafely, if is nof common pracfice fo record fhis informafion, for example, we have 
been unable fo determine fhe full experimenfal layouf for many of fhe examples in fhis 
book. Where fhis is fhe case, we use dummy sfrucfural factors, for example, we use factor 
DPot, fo arbifrarily label pofs, wifh fhe D prefix indicafing a dummy factor. Freafmenf fac- 
tors can somefimes be used as dummy sfrucfural facfors, buf we believe fhaf fhis pracfice 
is confusing and prefer fo avoid if. We discuss fhis furfher af fhe end of Secfion 7.3. 

Finally, we menfion fwo concepfs often used fo describe fhe sfrucfure of designs: orfhog- 
onalify and balance. Two facfors are said fo be orthogonal if fhe esfimafed effecfs for each 
facfor are fhe same regardless of whefher fhe ofher ferm is included or nof in fhe model. A 
more rigorous mafhemafical definifion of orfhogonalify is beyond fhe scope of fhis book 
(buf defails can be found in Bailey, 2008). Mosf of fhe designs fhaf we consider in fhis book 
are orfhogonal, and fhe concepf and consequences of non-orfhogonalify are discussed in 
Chapfer 11. The concepf of balance refers fo informafion on freafmenf differences. In fhe 
simplesf case of an unsfrucfured sef of experimenf unifs, a design is balanced if fhe preci- 
sion of all freafmenf comparisons is equal. For a sfrucfured sef of unifs, a design is bal- 
anced if fhe precision of all freafmenf comparisons is equal wifhin each sfrafum. Mosf of 
fhe designs fhaf we consider in fhis book are balanced, and fhe complicafions infroduced 
by unbalanced designs are discussed wifhin Chapfers 11 and 16. 
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3.3 Common Forms of Design for Experiments 

There are many types of statistical design for experiments, which differ from one another 
in their complexity and in their statistical properties. In the following sections, we describe 
briefly some common designs illustrated using a simple treatment structure (more com- 
plex treatment structures are considered in Chapter 8). 

3.3.1 The Completely Randomized Design 

The completely randomized design (CRD) is the simplest form of design and is appropri- 
ate if the experimental units are unstructured and homogeneous, so that there is no need 
for any form of blocking. The random allocation of treatments to experimental units is not 
constrained in any way, so that each treatment is equally likely to be allocated to each unit. 
This is the only case in which we omit the structural component from our model, as this 
comprises only a single factor which indexes each observation. 

EXAMPLE 3.4: CALCIUM POT TRIAL* 

An experiment was devised to investigate the effect of differences in soil calcium on 
the root growth of plants. The experimental material consisted of 20 pots, each con- 
taining one plant, arranged in a grid with four rows and five columns, under uniform 
controlled conditions. The treatments comprised four relative concentrations of calcium 
(A = 1, B = 5, C = 10, D = 20). Each treatment was applied to five pots selected at random 
to give a CRD. The layout for this design is shown in Table 3.2. 

The main advantages of this design are that it is easy to set up and has a simple form of 
analysis. It is also flexible, as the statistical analysis is still simple if the replication varies 
between treatments or if data are missing for some units. The CRD also provides maximal 
information on the background unit-to-unit variation, as none of the between-unit infor- 
mation is used to assess blocking. However, this is also a weakness of the design if hetero- 
geneity among units is present, as this heterogeneity will inflate the background variation 
and decrease the precision of estimates of treatment differences. The statistical analysis for 
this design is presented in Chapter 4. 



TABLE 3.2 

Randomization for the Calcium Pot Trial, with Pot 
Numbers (1-20), as a CRD with Four Treatments Labelled 
A-D, Each with Five Replicates (Example 3.4) 
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3.3.2 The Randomized Complete Block Design 

The randomized complete block design (RCBD) is the simplest design that includes 
blocking and is probably the most frequently used design. In this design, the number of 
experimenfal units in each block must be the same as the number of treatmenfs. Within 
each block, treatments are then randomly assigned to experimental units with a different 
randomization for each block. The design is called complete because all treatments occur 
within each block. If we use factors Block to label the blocks, and Unit to label the units 
within each block, then this is a nested structure with two strata, represented using our 
symbolic notation as 

Structural component: Block/Unit 

or written in expanded form as 

Structural component: Block + Block. Unit 

EXAMPLE 3.5: POTATO YIELDS* 

An experiment was devised to investigate the effects of four different types of fungi- 
cides (labelled FI, F2, F3, F4) on the yield of potatoes in field plots. An untreated control 
treatment (labelled Control) was also included to give a baseline comparison. In the 
field designated for the trial, heterogeneity was thought to be present at large scales, 
but suitable blocks of five field plots could be identified and so a RCBD with four blocks 
each of five plots could be used. The randomized layout is shown in Table 3.3. The struc- 
tural component is written as 

Structural component: Block/Plot 

The RCBD is popular (and useful) because it includes some blocking to deal with hetero- 
geneity between experimental units, while still being straightforward to manage and with 
a simple statistical analysis. Because each treatment occurs once in each block, this design 
is both orthogonal (treatments are orthogonal to blocks) and balanced (all treatment com- 
parisons are made with equal precision). Details of statistical analysis for this design are 
presented in Chapter 7. A weakness of the design is that the block size must be equal to 
the number of treatmenfs, and so the RCBD may be inefficient if the natural block size, as 
determined by the experimental material, is smaller than the number of treatmenfs. The 
RCBD will also be inefficient if two independent sources of background heterogeneity are 
present. We introduce appropriate designs for fhese situations (the balanced incomplete 
block design and the Latin square design, respectively) below. 



TABLE 3.3 

Randomization for the Potato Yields Trial as a RCBD with Five Treatments 
in Four Blocks of Five Plots (Example 3.5) 
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3.3.3 The Latin Square Design 

The Latin square (LS) design is useful where patterns of heterogeneity are associated with 
two crossed structural factors with the same numbers of levels. Because this design was 
originally used for square layouts in field trials, the structural factors are often called Row 
and Column, corresponding to the spatial arrangement of the rows and columns of the 
layout, respectively However, these factors often correspond to non-spatial factors, such as 
time of day and observer. Using our symbolic notation, we write the crossed structure as 

Structural component: Row*Column 

or written in expanded form as 

Structural component: Row + Column + Row.Column 

This structure has three strata. The number of treatments must be equal to the numbers 
of rows and columns, and the treatment allocation is such that each treatment appears 
exactly once in each row and once in each column (see Figure 3.6a). 

Construction of a Latin square is more complex than for the RCBD, as the three-way 
inter-relationship between rows, columns and treatments must be preserved. Tables of 
standard Latin squares have been published for small numbers of treatments (e.g. see 
Cochran and Cox, 1957, or Fisher and Yates, 1963), but statistical software can be used to 
obtain Latin squares for any number of treatments. To generate a randomization, one stan- 
dard square of the right size is first selected at random. The order of the columns is then 
randomized, followed by the order of the rows (as illustrated in Figure 3.6). This random- 
ization preserves the structure of the design while giving a very large number of possible 
squares, and thus avoiding bias. 

EXAMPLE 3.6: LUPIN TRIAL 

An experiment was devised to investigate the effects of soil type and water availabil- 
ity on the growth of lupins. The experiment was to be done with pots on a bench in a 
glasshouse, with a systematic trend running along the bench (left-right) as a result of a 
temperature gradient, and across the bench (up-down) because of differing light levels. 

The rows and columns within the array of pots were therefore considered as block- 
ing factors with a crossed structure, and a LS design is appropriate. Four treatments, 
labelled CL, CH, SL and SH, representing different combinations of soil type (clay or 
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FIGURE 3.6 

Randomization of a LS design. Rows, columns and treatments are labelled R1-R4, C1-C4 and A-D, respectively, 
(a) Start with a standard LS design, then (b) randomize the order of the columns and (c) finally randomize the 
order of the rows. 
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TABLE 3.4 



Randomization for the Lupin Trial as a Latin Square Design with Four 
Treatments Labelled CH, CL, SH and SL (Example 3.6) 
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sand) and the amount of water supplied (low or high) were used. A randomized layout 
for a LS design for this experiment is shown in Table 3.4. It is straightforward to verify 
that each treatment can be found once in each row and once in each column and that 
each row or column contains all four treatments. 



The main disadvantage of the LS design is the restriction that the number of rows, col- 
umns and treatments must all be equal. This is discussed further in Section 9.1, where 
some extensions of the LS design are also described. 



3.3.4 The Split-Plot Design 

The split-plot (SP) design has a nested structure, and is used in the case where (at least) 
two treatment factors are present, with the levels of one treatment factor having to be 
applied to large experimental units while the levels of another treatment factor can be 
applied to smaller units. Here we consider a standard form of the SP design with two treat- 
ment factors, A and B, with a crossed structure, and a nested structural component with 
three strata. The highest level of structure corresponds to complete replicates of the set of 
treatments, and we denote this level using the factor Block. Each block is then divided into 
a number of whole plots (factor WPIot), with levels of treatment factor A randomized to the 
whole plots separately within each block. Finally, each whole plot is divided into a number 
of subplots (factor Subplot), and the levels of factor B are randomized onto subplots within 
each whole plot. This design can be represented in symbolic form as 

Explanatory component: A*B 

Structural component: Block/WPIot/Subplot 

EXAMPLE 3.7: WEED COMPETITION EXPERIMENT 

A field trial was set up to study the competitive effects of three different weed species 
in winter wheat under different levels of water stress. Variation in water stress was 
provided by the presence or absence of irrigation, which could be applied only to large 
areas of land whereas the weed species could be applied to small plots. A SP design was 
therefore deemed suitable, with the two irrigation treatments (factor Irrigation, with two 
levels) applied to whole plots (factor WholePlot, with two levels). Each whole plot was 
split into four subplots (factor Subplot, with four levels), and a pre-determined popula- 
tion of each weed species (Alopecurus myosuroides (black-grass), Galium aparine (cleavers) 
and Stellaria media (chickweed), abbreviated as Am, Ga and Sm, respectively) was sown 
in one of these four subplots. The remaining subplot within each whole plot had no 
weed seeds added, and it was used as a control. Eactor Species was used to label the 
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TABLE 3.5 

Randomization for the Weed Competition Experiment as a Split-Plot Design with 
Two Whole-Plot Treatments (Irrigated, Highlighted in Grey, and Non-Irrigated) and 
Four Subplot Treatments (Weed Species Am, Sm, Ga or Control, -) (Example 3.7) 



Block 1 Block 2 Block 3 Block 4 




four weed treatments, i.e. the three added populations and control. This structure was 
repeated another three times, giving four blocks (factor Block, with four levels), with a 
different randomization in each block, as shown in Table 3.5. The model for this design 
can be written in symbolic form as 

Explanatory component: lrrigation*Species 

Structural component: Block/WholePlot/SubPlot 

The statistical analysis for, drawbacks of and variafions on fhis design are discussed in 
Secfion 9.2. 



3.3.5 The Balanced Incomplete Block Design 

The balanced incomplete block design (BIBD) can be useful when fhere is only one block- 
ing factor buf fhe number of unifs per block is smaller fhan fhe number of freatmenfs. In 
fhis case, each block can confain only a subsef of fhe freafmenfs, and designs wifh fhis 
properfy are known as incomplete block designs. A BIBD has the additional character- 
istic of balance, which requires fhaf all freafmenf comparisons have equal precision. This 
is achieved if fhe freafmenfs have equal replicafion and each pair of freafmenfs occurs 
together within a block exactly the same number of fimes over fhe whole experimenf. If we 
again use factor Block to label fhe blocks, and factor Unit to label the units within blocks, 
then this design has the same nested blocking structure as the RCBD, represented as 

Structural component: Block/Unit 

Construction of a BIBD is more complex than for the RCBD, as the balanced inter-rela- 
tionship between blocks and treatments must be preserved. Tables of standard BIBDs 
have been published (e.g. see Cochran and Cox, 1957, or Fisher and Yates, 1963) and can 
be used to generate a BIBD. These designs are also available in many statistical packages. 
The first step in construction is to choose a standard design with the right block size and 
number of treatments. The standard layout is then randomized first by randomization 
of the order of the blocks, and then randomization of the order of the treatments present 
within each block. 
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TABLE 3.6 



Randomization for Grain Protein Content Experiment as a BIBD with Six Treatments 
Labelled A-F in Six Sessions (Blocks), Each Containing Five Samples (Units) (Example 3.8) 
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EXAMPLE 3.8: GRAIN PROTEIN CONTENT* 

An experiment was devised to evaluate the grain protein content for six different variet- 
ies of pea (labelled A, B, C, D, E and F). Five independenf samples of grain were available 
for each variety. Only five samples (factor Sample, five levels) could be assessed within 
a session (factor Session, six levels), with possible heterogeneity between sessions, so 
a BIBD wifh six blocks (corresponding to sessions), each containing five units (corre- 
sponding to samples) was used. The structural component was specified as 

Structural component: Session/Sample 

A randomized plan for fhis design is shown in Table 3.6. For fhis design each variety is 
replicated five times, as each appears in five of the six sessions, and any pair of varieties 
is present together in four sessions, for example, variefies E and F are both present in 
sessions 1, 2, 3 and 6. 

One drawback of BIBDs is that the range of available designs is fairly limited: it is not 
always possible to construct a BIBD for a given number of treatments, number of blocks 
and block size (number of units per block). More details are given in Section 9.3. 



3.3.6 Generating a Randomized Design 

Once a design has been chosen, and the numbers of treatments and replicates have been 
defined, then a randomized layout or plan for the experiment can be generated. Most gen- 
eral statistical software (including GenStat, R and SAS) have some facilities for generat- 
ing standard designs, including most of those considered in this book. For non-standard 
designs (including some BIBDs), more specialist software, such as CycDesigN (see http:/ 
www.vsni.co.uk/software/cycdesign) must be used. 



EXERCISES 

3.1 Suppose that you are planning an experiment to investigate the impact of nutrient 
deprivation on plant metabolites. You have four different nutrient levels to test, 
obtained by applying appropriate nutrients to four sub-samples from a single bag 
of base compost. The resulting volume of each nutrient level is sufficient for four 
seed trays (i.e. 16 seed trays in total), and six plants will be grown in each seed tray. 
To achieve the required growing conditions, a small CE cabinet will be used. The 
cabinet has four shelves, and you can fit four seed trays on each shelf in a 2 x 2 



Principles for Designing Experiments 



63 



CE cabinet 




FIGURE 3.7 

Structure of a CE cabinet to be used for an experiment to investigate the impact of nutrient deprivation on plant 
metabolites (Exercise 3.1). 



arrangement (Figure 3.7). Although the cabinet is supposed to provide a uniform 
environment, a technician suggests that light levels may vary between the shelves, 
and that this might affect plant growth. When they reach the required growth 
stage, the six plants from each seed fray will be bulked and processed together 
to form a single sample to be read by a machine. Your colleague fells you fhaf fhe 
machine shows some drift over time, but that readings should stay stable across a 
set of up to six samples. 

How might you design this experiment to obtain an unbiased assessment of 
differences between the four nufrienf levels? Consider and discuss fhe differenf 
factors which mighf affect your choice of design and produce a candidafe design. 
You should consider both stages of fhe experimenf and the following issues: 

• Whaf is fhe experimenfal unif for the nutrient treatments? 

• What are the sources of heferogeneify in fhe experimenfal process? 

• How mighf you deal wifh fhis heferogeneify? 

• How would you allocafe the treatments to the experimental units? 

• What replication do you have for each freatmenf? 

• Whaf are fhe advanfages/disadvanfages of your design? 

How would you modify your design if 

a. A femperafure gradienf was discovered befween fhe fronf and back of fhe 
shelves 

b. You wanf to include a CO 2 freatmenf that can only be applied to a whole CE 
cabinet and you obtain sufficient resources for 32 frays (eighf for each nufrienf 
level) 
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3.2 Identify the experimental unit, the replication for each treatment and whether 
pseudo-replication is present in the following experiments. 

a. A pot experiment with 12 circular pots in a 2 x 6 array, in a uniform environ- 
ment. Each pot contains four plants at the three-leaf stage, and each of four 
treatments (labelled A-D) were applied at random to one plant per pot as 
shown in Figure 3.8. 

b. A field experiment with 12 homogeneous rectangular plots in a 3 x 4 grid. 
Two treatments (labelled A and B) were applied at random to six plots each 
(Figure 3.9). At harvest, 25 plants are to be sampled per plot, and the plants 
from each plot will be processed as a single batch for measurement. 

c. The field experiment described in part (b) (Figure 3.9) with the height of 25 
individual plants per plot measured and recorded in situ at 4-weekly intervals 
from tillering until harvest. 

3.3 Four replicates of each of four treatments, labelled A-D, are to be applied at ran- 
dom to batches of aphids in 16 Petri dishes laid out in a 2 x 8 array (Figure 3.10). 
The environment is thought to be homogeneous. Use a pack of playing cards to 
determine an appropriate randomization for this experiment. 



1 2 3 4 5 6 




FIGURE 3.8 

Experimental layout for a pot experiment with 12 pots and four treatments, labelled A-D, applied to plants 
within pots. Letters denote the positions of the plants and the treatment applied (Exercise 3.2a). 
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FIGURE 3.9 

Experimental layout for a field experiment with plots in a 3 x 4 grid, showing the allocation of treatments (A or 
B) to plots (Exercises 3.2b and c). 
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1 2 3 4 5 6 7 8 




FIGURE 3.10 

Layout of 16 numbered Petri dishes for an aphid experiment using four replicates of four treatments 
(Exercise 3.3). 
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FIGURE 3.11 

Layout (with numbered plots) for an experiment testing six treatments in a field with a pH gradient running 
from west to east (Exercise 3.4). 



3.4 Four novel herbicides (labelled A-D) are to be compared with a commercial 
product (labelled P) and a hand-weeded control (labelled H) in a field trial, giv- 
ing six treatments in total. The field available can accommodate 24 plots in an 
array of four columns running norfh fo soufh, by six rows running west to east 
(Figure 3.11). The field has a known pH gradienf running wesf to east, i.e. along 
rows, which may affect crop growth. Produce a RCBD which accounts for fhis 
gradienf wifh a randomized allocafion of freafmenfs fo plots using (a) a pack of 
playing cards, and (b) a standard six-sided die. 

3.5 The efficacy of six synfhefic insecf pheromones is fo be fesfed in fhe field. Traps 
are bailed wifh a single pheromone, deployed af dusk, leff ouf overnighf, fhen 
refrieved fhe nexf morning and fhe insecf cafches recorded. There is sufficienf 
maferial fo bail six fraps wifh each pheromone. 
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a. Consider how you might use a RCBD for this experiment if only six traps are 
available at any one time and all six traps will be placed in the same field, 
but a different field will be used each night. Are any structural factors con- 
founded? What are the assumptions of this design? Write down the structural 
component for this design. 

b. How would you change your design if the same trap locations were to be used 
each night and the positions could not be considered homogeneous? Write 
down the structural component for this design. 

c. How might you modify your design if 18 traps are available at any one time? 
Under which conditions would designs based on CRD, RCBD or LS arrange- 
ments be preferable? 

d. What design might you use if only four pheromones are to be tested, with six 
traps available at any one time, and enough material for nine replicates of each 
pheromone? 

3.6 The effect of temperature on the transmission of a virus by five aphid species is 
to be investigated. Three small growth chambers are available and three tem- 
peratures will be tested. The temperature for each chamber can be set and then 
applies to the whole chamber, and each chamber can hold five plants in indi- 
vidual pots. One aphid will be placed onto each plant using a clip cage. Forty-five 
plants and 15 aphids of each species are available. Assuming that chambers (and 
positions within chambers) can be considered homogeneous, suggest a design 
to test the effects of temperature and aphid species. What are the experimen- 
tal units for each factor? Produce a randomized design for this experiment and 
write down the explanatory and structural components for the design. If you 
suspected that there were systematic differences between chambers, how would 
you modify your design? Write down the structural component for this new 
design. 

3.7 A field experiment was set up to investigate how invertebrate abundance is 
affected by the spatial structure and species composition of weed patches 
(Smith, 2007). Small weed patches were formed from three pots of plants in a 
tray. Species composition was varied by using different numbers of mayweed 
(M) or thistle (T) plants in the patch, i.e. 3M, 2M -i- IT, IM -i- 2T or 3T. Spatial 
structure was varied by changing the distance between patches (12 or 6 m). Five 
blocks of two whole plots were set up, with the two spacings allocated at random 
to whole plots within blocks. Each whole plot contained 16 patches laid out in a 
4x4 array with the designated spacing, with patches allocated to four replicates 
of each of the four species compositions according to a LS design. A different 
randomization was used within each whole plot. Write down the explanatory 
and structural components for this design. 

3.8 A glasshouse experiment to compare the effect of two nutrition regimes on the 
growth of three wheat varieties was set up as a RCBD with 12 blocks of six pots 
each, as shown in Figure 3.12. The treatments comprise the six combinations 
of nutrition regime (labelled Nl, N2) and variety (labelled V1-V3). The blocks 
accommodate an expected temperature gradient running from the door to the 
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FIGURE 3.12 

Layout of pots (labelled 1-72) as a RCBD in a greenhouse experiment to compare the effects of two nutrition 
regimes (N1 and N2) on the growth of three wheat varieties (VI, V2 and V3). Blocks (columns) contain six pots. 
One block shows treatment labels in addition to pot numbers (Exercise 3.8). 



far end of fhe glasshouse. Several characferisfics of each plant, including height 
and number of leaves, are to be recorded every week. Suggest acceptable proto- 
cols for recording data if 

a. You are the only person available to take the measurements 

b. There are two people available to take the measurements 
Which protocols would be unacceptable and why? 



4 

Models for a Single Factor 



In this chapter, we present the analysis for data classified by a single explanatory fac- 
tor. In fhe confexf of designed experimenfs, fhis would correspond fo fhe case of a com- 
plefely randomized design (Secfion 3.3.1) wifh a single freafmenf facfor. Equivalenfly, fhis 
fype of dafa mighf arise from an observafional sfudy in which fhe observafions have been 
selected fo conform fo a single pre-defined classificafion, or grouping variable. In bofh 
cases, fhe only sfrucfure in fhe dafa is fhe freafmenf or grouping facfor; fhere musf be 
no ofher explanafory variables and no ofher sfrucfure, such as blocking or pseudo-rep- 
licafion, associated wifh fhe experimenfal material. If any such sfrucfure is presenf, fhen 
you should use a more complex analysis (see Chapters 7, 9 and 16 for defails). In fhe case 
where only a single facfor - represenfing freafmenfs or groups - is presenf, fhe aim of fhe 
analysis is fo discover if fhere are any differences in response befween fhe facfor levels. For 
brevify, here we use fhe ferm 'freafmenfs' fo cover eifher a sef of imposed freafmenfs or a 
sef of observed groups. The firsf step in fhe analysis is fo wrife down a model for fhe dafa 
in ferms of fhe unknown populafion mean for each freafmenf (Secfion 4.1). The principle 
of leasf squares is used fo esfimafe fhese freafmenf means (Secfion 4.2). The fechnique of 
ANOVA is fhen used fo parfifion fhe variafion in fhe dafa (Secfion 4.3). This analysis serves 
several purposes: we can obfain an esfimafe of fhe background variafion, which in furn 
is used fo indicafe uncerfainfy on esfimafes of freafmenf means; we can also obfain an 
esfimafe of fhe amounf of variafion in fhe dafa accounted for by freafmenf differences, and 
compare fhis wifh fhe background variafion. If fhe variafion befween freafmenfs is large 
compared wifh fhe background variafion, fhen we conclude fhaf subsfanfive differences 
befween freafmenfs are presenf in fhe dafa. This comparison is formalized in an F-fesf, 
and differences befween pairs of freafmenf means can be compared wifh fhe sfandard 
error of fhe difference (SED) fo idenfify significanf differences befween responses for dif- 
ferenf freafmenfs (Secfion 4.4). There are several forms (parameferizafions) of fhe ANOVA 
model for a single freafmenf facfor, and we explain some of fhe differenf forms used in 
sfafisfical soffware (Secfion 4.5). 



4.1 Defining the Model 

Here, we consider a sef of observafions classified by a single freafmenf facfor. If is nafural 
fo label observafions by fheir facfor level, i.e. fhe freafmenf group fo which fhey belong, 
and fhen fo number observafions wifhin each freafmenf. We use a general nofafion fhaf 
can be adapted fo apply fo any dafa sef. The freafmenfs are labelled by index j, and fhe 
number of freafmenfs is denoted as t. We label observafions wifhin freafmenfs using index 
k, allow fhe replicafion fo differ befween freafmenfs, and denofe fhe number of observa- 
fions for fhe jth freafmenf as Wy. Then, yji^ represenfs fhe response from fhe kth observafion 
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on the yth treatment and the full set of responses can be denofed as 1 /^;^, ; = 1 ... t,k = l ... 
(see Section 2.1 for more explanation of fhis nofafion). The fofal number of observafions is 
fhe sum of fhe number of replicafes across all fhe freafmenfs, denofed N = Wj + Wj + • ■ ■ + Wf 
or N = Z‘=in;. 

The only sfrucfure associafed wifh fhe observafions is due fo fhe freafmenf groups, buf 
fhere will also be variation among fhe responses wifhin each freafmenf. We can describe 
fhis sfrucfure using a simple model, as 

Vjk = h; + ejk , (4.1) 

where is fhe frue (buf unknown) populafion mean for fhe jth freafmenf, and eji^ is fhe 
deviafion of fhe kih response on fhe ;fh freafmenf from ifs populafion mean. The sef of t 
unknown populafion means, p^, j = 1 ... t, are fhe paramefers of fhis model. Equation 4.1 
essentially says fhaf each observation consisfs of fwo parfs, a confribufion due fo fhe freaf- 
menf and a deviafion due fo fhe individual. These fwo parfs correspond fo fhe sysfemafic 
and random componenfs, respecfively, of fhe general model described in Section 1.3. An 
example of fhis sifuafion is shown in Figure 1.1a. The individual deviafion is somefimes 
called random noise, residual error or measuremenf error. The ferm 'error' here jusf reflecfs 
fhe presence of variation, and hence uncerfainfy in ascerfaining frue populafion values: if 
is nof infended fo imply fhaf a misfake has occurred. Hence, we prefer fhe alfemafive 
ferm deviation. In general, the deviation reflects natural between-unit biological variation, 
variation within the study environment and inaccuracies in measurement. The deviations 
are regarded as random, without any structure related to the experimental units and not 
under control of fhe experimenfer. 

Nofe fhaf fhe labelling used here for fhe observafions, subscripfs j and k, is chosen fo 
idenfify fhe observafions associafed wifh each freafmenf. Ofher labelling schemes, such 
as use of subscripf i fo number fhe unifs in fhe order of fhe experimenfal layouf, are 
equally valid, and are somefimes preferable. For example, ordering by layouf is required 
fo check for spafial frends. We sfrongly recommend recording informafion on fhe full 
experimenfal layouf wifhin any dafa sef so fhaf fhe link befween fhe freafmenfs and unifs 
is refained. 

Using fhe symbolic nofafion infroduced in Chapfer 3, we can represenf fhe model in 
Equation 4.1 as 

Response variable: Y 

Explanatory component: Treatment 

Here, we have a variate named / containing the observed response and a factor called 
Treatment that identifies fhe freafmenf group from which each observafion arises. Nofe 
fhaf we use ifalic tonf fo disfinguish variafe names from facfor names. In fhis case, 
fhe explanatory componenf is represented by a single facfor and fhere is no sfrucfural 
componenf. 

To make inferences on the unknown parameters in Equation 4.1 (and in any linear model), 
we make some assumptions about the deviations. These assumptions are given here for fhe 
general case, so, for simplicify, we replace fhe subscripf jk by fhe subscripf i so fhaf e, is fhe 
deviafion corresponding fo fhe ffh unif (for i = l ... N). In estimating fhe unknown param- 
efers, we assume fhaf e, is a realization of a random variable wifh fhe properties 
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Assumption 1 



E(e,) = 0 forf=l...N. 

The expected value (function E) of each deviation is assumed to be zero. This means that 
the population mean of fhe deviafions is zero, which implies no systemafic bias in the 
observations. ■ 



Assumption 2 



Var(c;) = for i = 1 ... N . 

The variances (Var) of the deviations are the same for all units. This is also known as 
homoscedasticity, or homogeneity of variances. ■ 



Assumption 3 

Cov(e„ ej) = 0 for all i j, and i, / = 1 . . . N . 

The covariance (Cov) between deviations for two separate observations is zero and the 
deviations are independent. ■ 

Assumption 4 

e, ~ Normal(0, (f) for i=l ...N . 

The deviations follow a Normal distribufion wifh mean 0 and variance a^. ■ 

In addition, we make an assumption on the explanatory variables: 



Assumption 5 

The values of fhe explanatory variables (factors or variates) are known without error. ■ 



The first three assumptions require that the deviations are independent and identi- 
cally distributed, i.e. arise from the same underlying probability distribution. The fourth 
assumption adds the requirement that this is the Normal distribution, and this is necessary 
to make valid statistical inferences that rely on the properties of the Normal distribution. 
This includes significance testing (E-test or t-test, see Sections 4.3 and 4.4) and the calcula- 
tion of confidence intervals (CIs) (Section 4.4). In Chapters 5, 6 and 18, common violations 
of Assumptions 1-4 are discussed in detail. Eor the case of data with a single explanatory 
factor. Assumption 5 requires that each observation can be allocated to a treatment group 
without error, which is usually a realistic requirement. In general, when explanatory vari- 
ates have been measured, possibly with error, then the assumption may become unrealis- 
tic. Consequences of violating this assumption are discussed in Chapter 13. 
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In some areas of biological science, it is common practice to summarize treatment 
responses from an experiment graphically, with bar charts showing the sample means 
and standard deviations (SDs) for each treatment, as in Example 4.1A. This can be a useful 
precursor to a formal statistical analysis as it provides an informal check as to whether the 
observations comply with Assumption 2, i.e. that all random deviations share a common 
variance. If this is the case, then the sample SDs should be roughly equal across treat- 
ments. In practice, for little or moderate replication, there may appear to be differences in 
sample SDs across treatments even when the assumption is true. 

EXAMPLE 4.1A: CALCIUM POT TRIAL* 

An experiment devised to evaluate the effect of four relative concentrations (levels) of 
calcium (A = 1, B = 5, C = 10, D = 20) on root growth was introduced in Example 3.4. Each 
treatment was applied to five individual plants growing in pots. The experiment used a 
CRD, as shown in Table 3.2, and measurements of total root length (cm) were made on 
pots 1-20 (in order) at the end of the experiment. The data set is presented in summary 
form in Table 4.1, and can be found in file calcium.dat in the flat file format required by 
statistical software. This format puts the explanatory variables and responses into par- 
allel columns. Here, the file contains three columns: a variate (Pot) to uniquely identify 
each pot; a factor (Calcium, with four levels) to identify the treatment group for each pot; 
and the response variate (Length) obtained from each pot. 

The model for these data can be written in symbolic notation as 

Response variable: Length 

Explanatory component: Calcium 

We also use an informal version of the mathematical model given in Equation 4.1, and 
write this model in a more interpretable form that is relevant to the data, as 

Lengthji, = Calciunij + e^j . , 

where Lengthji^ represents the root length for the fcth plant in the jth treatment group 
(with / = 1 . . . 4 corresponding to A-D respectively), Calciunij represents the mean of the 
;th treatment group and qj. is the deviation for the fcth plant in the/th treatment group. 

Before a formal analysis, it can be helpful to use sample statistics, such as treatment 
means and SDs, to summarize the data. These values are given in Table 4.1 and also 



TABLE 4.1 



Calcium Pot Trial Data: Total Root Length (cm) and Summary Statistics for Plants Treated with 
Four Relative Concentrations of Calcium (A-D) according to a CRD (Example 4.1A and File 
calcium.dat) 



Replicate 


A 


6 


C 


D 


1 


58 


80 


49 


47 


2 


52 


68 


70 


49 


3 


74 


72 


72 


45 


4 


58 


74 


74 


48 


5 


79 


85 


71 


38 


Treatment mean 


64.2 


75.8 


67.2 


45.4 


Treatment variance 


135.20 


45.20 


105.70 


19.30 


Treatment SD 


11.63 


6.72 


10.28 


4.39 
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B 



C 



D 



Calcium concentration 



FIGURE 4.1 



Summary statistics: treatment means and unbiased sample standard deviations (SD) for a CRD measuring root 
lengths (cm) under four calcium concentrations (A, B, C, D), each with five replicates (Example 4.1A). Vertical 
bars represent ± 1 x SD for each treatment. 

shown graphically in a bar chart in Figure 4.1. The sample grand mean, y, is equal to 
63.15 cm. Root growth appears to be greater for calcium levels B and C and the SDs are 
broadly similar across treatments A-C, but appear smaller for calcium level D. This plof 
is exploratory and we discuss better ways to present experimental results in Section 4.4. 



4.2 Estimating the Model Parameters 

The parameters associated with the model in Equation 4.1, namely the population means 
for each treatment, j = l ... t, are unknown quantities to be estimated from fhe data. 
Recall (from Secfion 1.5) that we denoted parameter estimates by placing a hat symbol 
(^) above the parameter symbol. The estimated population mean for fhe jth treatment is 
thus denoted |iy. Recall also that the fitted value for an observation denoted yjk, consists 
of fhe systematic component of the model with parameters replaced by their estimates. 
Hence, here 



The parameters are estimated by the principle of least squares, which finds fhe values 
of the parameters that minimize the sum of fhe squares of the differences befween fhe 
observed data and their fitted values, called the residual sum of squares (ResSS), which 
can be written mathematically as 



Vjk = P; . 




(4.2) 
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The details of the minimization are not required to understand the principles, but are 
shown in Section C.l for interested readers. The resulting estimate of the population mean 
for the yth treatment is the sample mean of the observed responses for that treatment, i.e. 



h, = 



— ^y,vc 
n. ^ 



I k=l 



y,- ■ 



This sample mean is written with the dot notation introduced in Section 2.1. Recall that 
the dot symbol, in the position of index k indicates summation across all values of that 
index, i.e. for k = l ... with the other index (or, in general, indices) held fixed. The bar 
symbol, over the y indicates that the mean is calculated by division of the sum by the 
number of components in the summation, here Uj. 

EXAMPLE 4.1B: CALCIUM POT TRIAL* 

From Table 4.1, we can now estimate the population means for the four calcium treat- 
ments as 

Calciunii = 64.2; Calciunii - 75.8; Calcium^ = 67.2; Calcium^ = 45.4 . 



Having obtained the treatment sample means as estimates of the treatment population 
means, we require some measure of the uncertainty in these estimates to form standard 
errors and CIs. It is also helpful to be able to formally test whether the data show evidence 
of substantive differences among the treatment population means. The ANOVA provides 
a framework for estimation of the within- and between-treatment variances, and formal 
tests of differences among treatment means. Given that the assumption of a common vari- 
ance for the set of deviations (denoted a^) is realistic, a pooled estimate of this variance 
(denoted s^) is obtained from ANOVA of the data, and can be used to derive standard 
errors and CIs for single treatments or treatment comparisons. 



4.3 Summarizing the Importance of Model Terms 

For a set of observations classified by a single treatment factor, the primary question of 
interest is whether there are any differences in the responses among treatments. Analysis 
of variance (ANOVA) is a statistical technique that enables us to address this question 
directly. Intuitively, it is easy to see that if treatment differences exist, then variation in 
responses among observations for different treatments ('between treatments') will be 
larger than the variation in responses among observations for the same treatment ('within 
treatments'), with the additional variation being directly attributable to the treatment dif- 
ferences. ANOVA uses this principle to partition the total variation for a given response 
into the variation attributable to treatment differences and the variation due to random 
deviations (which we refer to as background variation). 

In the general case, given any model for a set of observations in the form described in 
Section 1.3, i.e. 



response = systematic component -i- random component. 



Models for a Single Factor 



75 



ANOVA considers the variation in the response associated with all the parts of the sys- 
tematic component of fhe model (sysfemafic variafion) and compares if fo fhe background 
variafion associafed wifh fhe random componenf of fhe model. Informally, if fhe rafio of 
sysfemafic variafion fo background variafion is large, fhen we can conclude fhaf fhe pro- 
posed model accounfs for much of fhe variafion in fhe response, and fhaf fhe explanafory 
variable(s) provide a good explanafion of fhe observed response. Formally, ANOVA can 
be used fo fesf various hypofheses abouf fhe form of fhe sysfemafic componenf, from fhe 
simplesf case of a single freafmenf factor, presented in fhis chapter, fo fhe more complex 
explanafory and sfrucfural componenfs described in Chapters 7 fo 9. 

The simplesf applicafion of ANOVA, fo dafa classified by a single factor, is usually 
referred fo as one-way analysis of variance. This analysis can be regarded as an extension 
of fhe fwo-sample f-fesf fo allow comparison of several freafmenfs simulfaneously (wifh 
exacf equivalence of fhese mefhods in fhe case of only fwo freafmenfs). Recall fhaf fhe 
fwo-sample f-fesf (Secfion 2.4.2) is used fo compare fhe observed difference befween fwo 
sample means wifh fhe expected variafion in fhaf difference, based on a pooled esfimafe 
of fhe background variafion. The f-fesf was used fo evaluafe a null hypofhesis of equal- 
ify of fhe populafion means for fwo freafmenfs againsf an alfernafive hypofhesis fhaf fhe 
populafion means were nof equal. In fhe case where fhere are several (f) freafmenfs, if is 
also of inferesf fo esfablish whefher fhere is any evidence of differences among populafion 
means for fhe freafmenfs. The null hypofhesis is fhaf fhe freafmenf populafion means are 
all equal. Mafhemafically, fhis is wriffen as 

Ho:Pi = h2=---=hf 

Given fhe assumpfions underlying fhe analysis (presenfed in Secfion 4.1), if fhis hypofhesis 
is frue, fhen if implies fhaf fhe observafions for all freafmenfs arise from a single Normal 
disfribufion wifh common mean, p (e.g. Figure 4.2a). The general alfernafive hypofhesis, 
FIi, is fhaf fhe freafmenf populafion means are nof all equal. Taken in combinafion wifh 
fhe model assumpfions, fhis hypofhesis implies fhaf fhe observafions arise from a sef of 
Normal disfribufions wifh a common variance, buf wifh some variafion wifhin fhe sef of 
populafion means (e.g. Figure 4.2b). 

The fesf of fhis null hypofhesis compares fhe variafion among freafmenfs wifh fhe back- 
ground variafion. In common wifh fhe fwo-sample f-fesf of Secfion 2.4.2, fhis fesf uses an 





FIGURE 4.2 

(a) The assumed distribution of responses for a single factor model under the null hypothesis (treatment means 
equal), and (b) the assumed distributions for a single factor model with four groups (A, B, C, D) in a case where 
the null hypothesis is not true (treatment means not equal). 
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estimate of the background, or unit-to-unit, variability that is pooled across all treatment 
groups, in accordance with the assumption of common variance underlying fhe analysis. 

In fhe following secfions, we give defails of fhe ANOVA calculafions for a model wifh 
a single freafmenf factor to demonsfrafe fhe basic principles of fhe approach. Nowadays, 
sfafisfical packages are usually used for fhese calculafions and so if is nof necessary to give 
defailed formulae for all cases. We presenf full defails for fhe RCBD (Chapfer 7), buf ANOVA 
for more complex sfrucfures is discussed in less mafhemafical defail in Chapters 8 and 9. 

The calculafions in fhis secfion are presenfed for fhe general case of unequal replicafion 
across freafmenfs, buf we also give fhe simpler formulae used for fhe case of equal replica- 
fion (i.e. Wy = n for j = 1 ... t). 



4.3.1 Calculating Sums of Squares 

As sfafed above, fhe aim of ANOVA is fo parfifion fhe variafion in fhe response befween 
fwo or more sources. The sfafisfics used fo quanfify variafion are inifially calculafed as 
sums of squared deviafions abouf means, and hence referred fo as sums of squares. The 
fofal sum of squares (TofSS) is relafed fo fhe sample variance, and we calculate if by faking 
fhe difference befween each observed value and fhe sample grand mean, squaring fhese 
differences and fhen adding fhem fogefher. Algebraically, fhis is wriffen as 



f 

TofSS = (4.3) 

;=1 k=l 

Note fhaf fhis is equal fo (N - 1) x the unbiased sample variance (Equation 2.3). With equal 
replication, the expression becomes 



f n 

TotSS = f • 

i=l k=l 

This calculation is illustrated in Figure 4.3, for fhe case of fwo freafmenfs (f = 2) each wifh 
four replicafes (n = 4). The lengfhs of fhe verfical lines represenf fhe differences fo be 
squared and fhen added fogefher. 

For a one-way ANOVA, fhis fofal variafion is fhen parfif ioned info fhe variafion befween 
freafmenfs (or fhe freafmenf sum of squares, denofed TrfSS), and fhe background varia- 
fion, which is quanfified by fhe ResSS. Variafion befween freafmenfs is calculafed as fhe 
sum, over all observafions, of fhe squared differences befween fhe appropriate freafmenf 
sample mean and fhe sample grand mean, wriffen algebraically as 

t '!/ t 

TrtSS = yf = '^nj{yj.-yf . 

,=1 *:=! ,=1 

As fhe confribufions from observafions wifhin fhe same freafmenf group are repeafed, 
fhe expression can be simplified (as shown) fo be a sum across index j (fhe differenf 
freafmenf groups) mulfiplied by fhe replicafion (nj) for each freafmenf. This calculafion 
is illusfrafed in Figure 4.4, where again fhe lengfhs of fhe verfical lines represenf fhe 
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11 12 13 14 21 22 23 24 

Unit label jk 



FIGURE 4.3 

Calculation of the total sum of squares (TotSS) for a single factor model with two treatment groups (j = 1, 2) and 
four replicates per group {k=l ... 4). Each vertical line represents a difference between a response and the sample 
grand mean, or - y. 



differences to be squared and then added together. With equal replication of all treat- 
ments, this becomes 

f 

TrtSS = n^(yy.-yf. 

y=i 



The final step is the calculation of the ResSS. Within each treatment, background varia- 
tion is represented by the variation of each response about its treatment population mean. 



Treatment 1 



____ y^. 



J y 



Treatment 2 






11 



12 13 



14 21 

Unit label jk 



22 23 24 



FIGURE 4.4 

Calculation of the treatment sum of squares (TrtSS) for a single factor model with two treatment groups (;’ = 1, 2) 
and four replicates per group (k = l ... 4). Each vertical line represents a difference between a group sample mean 
and the sample grand mean, or i/y. - y. 
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The treatment population means are unknown, so we assess variation around their esti- 
mates, the treatment sample means. The ResSS is therefore calculated as the sum of fhe 
squared differences befween each response and ifs freafmenf sample mean, wriffen alge- 
braically as 



f "/ 

ResSS = . 

i=l k=l 

This can also be obfained by fhe subsfifufion of yj. = |1; info Equafion 4.2. Wifh equal rep- 
licafion of all freafmenfs, fhis becomes 

t n 

ResSS = 5^5^ {yjk-yj-f ■ 

j=l k=l 

This calculafion is illusfrafed in Figure 4.5, where again fhe lengfhs of fhe verfical lines 
represenf fhe differences fo be squared and fhen added fogefher. 

The name 'residual sum of squares' arises from a connecfion wifh fhe model residuals, 
Cjk, defined as fhe discrepancy befween fhe dafa and fhe fiffed sysfemafic componenf as 

e# = yjk - Vjk = yjk - |i; = yjk - Vj. ■ (4.4) 

if is immediafely clear fhaf fhe ResSS is simply equal fo fhe sum of fhe squared residuals, i.e. 

ResSS = - F;.)" = e|c • 

y=i ;c=i i=i k=\ 
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FIGURE 4.5 

Calculation of the residual sum of squares (ResSS) for a single factor model with two treatment groups (/ = 1, 2) 
and four replicates per group {k = l ... 4). Each vertical line represents a difference between a response and its 
group mean, oryj^-yj.. 
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ANOVA produces an additive partition of the total variation such that the total sum of 
squares is equal fo fhe sum of fhe freafmenf and residual sums of squares, i.e. 

TofSS = TrfSS + ResSS . (4.5) 

The relafionship in Equafion 4.5 means fhaf given any fwo ouf of fhe fofal, freafmenf or 
residual sums of squares, one can calculafe fhe fhird quanfify direcfly. For example, fhe 
residual sum of squares is equal fo fhe fofal sum of squares minus fhe freafmenf sum of 
squares, i.e. ResSS = TofSS - TrfSS. If is sfraighfforward (buf fiddly) fo verify fhis relafion- 
ship algebraically, i.e. by a rearrangemenf of fhe formula, and fhis is shown for fhe infer- 
esfed reader in Secfion C.2. 

EXAMPLE 4.1C: CALCIUM POT TRIAL* 

Calculation of sums of squares by hand is often helped by the use of structured tables. For 
example, to aid in the calculation of the TotSS, it is useful to draw up a table like Table 4.2. 

Note that the treatments here correspond to different levels of the factor Calcium. 

The first three columns list the treatment groups and label the units by treatment ()) 
and replicate within treatment (k). The fourth column lists the responses (root lengths) 
and the fifth column takes the difference between the responses and the sample grand 
mean (y = 63.15). The sixth column holds the squares of the differences from the fifth 
column and the total sum of squares can be obtained as the sum of the values in this 
final column, and here is equal to 3684.55. 



TABLE 4.2 

Calculation of Total Sum of Squares for Root Lengths from the Calcium Pot Trial (Example 4.1C) 



Calcium 


i 


k 




Vjk - y 


(Vjk - W 


A 


1 


1 


58 


-5.15 


26.5225 


A 


1 


2 


52 


-11.15 


124.3225 


A 


1 


3 


74 


10.85 


117.7225 


A 


1 


4 


58 


-5.15 


26.5225 


A 


1 


5 


79 


15.85 


251.2225 


B 


2 


1 


80 


16.85 


283.9225 


B 


2 


2 


68 


4.85 


23.5225 


B 


2 


3 


72 


8.85 


78.3225 


B 


2 


4 


74 


10.85 


117.7225 


B 


2 


5 


85 


21.85 


477.4225 


C 


3 


1 


49 


-14.15 


200.2225 


C 


3 


2 


70 


6.85 


46.9225 


c 


3 


3 


72 


8.85 


78.3225 


c 


3 


4 


74 


10.85 


117.7225 


c 


3 


5 


71 


7.85 


61.6225 


D 


4 


1 


47 


-16.15 


260.8225 


D 


4 


2 


49 


-14.15 


200.2225 


D 


4 


3 


45 


-18.15 


329.4225 


D 


4 


4 


48 


-15.15 


229.5225 


D 


4 


5 


38 


-25.15 


632.5225 


Total 


- 


- 


- 


0.00 


3684.5500 
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TABLE 4.3 



Calculation of Treatment Sum of Squares for Roof Lengths from the Calcium Pot Trial 
(Example 4.1C) 



Calcium 


i 




Vi- 


Vi- - y 




«>(yy- - y)' 


A 


1 


5 


64.2 


1.05 


1.1025 


5.5125 


B 


2 


5 


75.8 


12.65 


160.0225 


800.1125 


C 


3 


5 


67.2 


4.05 


16.4025 


82.0125 


D 


4 


5 


45.4 


-17.75 


315.0625 


1575.3125 


Total 


- 


- 


- 


0.00 


492.5900 


2462.9500 



A similar table can be useful in consfructing the calculations for the TrtSS (see 
Table 4.3). This table has a similar format, but in this case, the differences are befween 
fhe treatment means and the sample grand mean. Again, the treatment sum of squares, 
TrtSS, is equal to the sum of fhe values in the final column. In this case, with equal 
replication of w = 5, we could equivalently have taken the sum of fhe values in the pen- 
ultimate column (492.59) and multiplied it by the replication to get the same answer, i.e. 
TrtSS = 5 X 492.59 = 2462.95. 

Finally, calculation of fhe residual sum of squares is mosf easily done by subtraction as 
ResSS = TotSS - TrtSS = 3684.55 - 2462.95 = 1221.60 . 



4.3.2 Calculating Degrees of Freedom and Mean Squares 

To make statistical inferences about the sums of squares, we must also consider the amount 
of information used to form them. Each contribution to a sum of squares is a positive value 
(after being squared), so the sum must increase as the number of contributions increases. To 
compare sums of squares, it is therefore necessary to standardize them onto a common scale. 
We do this by considering the amount of information, or degrees of freedom (df), used in 
their construction. A rigorous mathematical definition of degrees of freedom is beyond the 
scope of this book but available elsewhere (e.g. Bailey, 2008) for the interested reader. 

All the sums of squares we consider take the form 

SS = Z (value - adjustment)^ , 

where the summation is across all the observations, and may use several indices. Both the 
'value' and the 'adjustment' arise from an estimated model. The degrees of freedom can be 
calculated as the (minimum) number of parameters required to calculate the model used for 
'values' minus the (minimum) number of parameters required to calculate the model used 
for 'adjustment'. So, for example, TotSS was obtained from the deviations of the individual 
observations about the sample grand mean. In this case, the 'values' in the SS are the N 
individual observations (uy from each of the treatments), which can only be formed from a 
model using N parameters. The adjustment is the sample grand mean, an estimate of a single 
parameter (the overall population mean). Therefore, the degrees of freedom for TotSS are 



TotDF = 



t 

s- 

V M 



-1 = N -1. 
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This expression can be written as (n x t) - 1 (or nt - 1) for the case of equal replicafion. 
Similarly, fhe freafmenf sum of squares (TrfSS) uses t values, i.e. fhe freafmenf sample 
means y,., again adjusfed by fhe sample grand mean, so fhe freafmenf degrees of freedom 
becomes 



TrfDF = f - 1 . 

Finally, fhe ResSS uses all fhe individual observafions, i.e. N values. The adjusfmenf arises 
from a model in which each freafmenf has a separafe mean, which requires t paramefers, 
one for each freafmenf. The residual degrees of freedom are fherefore 



ResDF = N -t. 



For equal replicafion, fhis can be wriffen as nt - t or (n - l)f. 

Nofe fhaf fhe TofDF is parfifioned befween fhe freafmenf and residual degrees of free- 
dom in a similar way fo fhe TofSS, namely 

TofDF = TrfDF -i- ResDF , 

and fhaf fhe residual degrees of freedom can be calculafed by subfracfion as ResDF = 
TofDF - TrfDF. 

Flaving calculafed fhe degrees of freedom for each ferm, we puf fhe freafmenf and resid- 
ual sums of squares onto a common scale by dividing each by fheir degrees of freedom, 
producing rafios known as mean squares. For one-way ANOVA, fhese calculafions are 

TrfMS = TrfSS/TrfDF , 

ResMS = ResSS/ResDF . 



4.3.3 Calculating Variance Ratios as Test Statistics 

The residual mean square (ResMS), sometimes also called the mean square error (MSB), 
provides an unbiased estimate of background variation (a^), with this estimate usually 
called s^. Intuitively, the ResMS quantifies the variation within each treatment group, 
which should arise from background variation alone, then combines this information 
across treatments, giving an estimate of background variation pooled across treatments. 

If there are no differences between treatments, then contributions to the treatment mean 
square, i.e. differences between the treatment means and the sample grand mean, arise 
from background variation alone. In this case, the treatment and residual mean squares 
should be of similar sizes, allowing for sampling variation. We can formalize this compari- 
son by considering the expected value of each of the respective mean squares, which are 

E(TrtMS) = , 

E(ResMS) = o" , 

where p is the overall population mean. The expected value of the TrfMS is equal to the 
background variation plus a scaled sum of the squared differences between the treatment 
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population means and the overall population mean. The expected value of the ResMS is 
simply equal to the background variation. If fhere are no differences befween freafmenfs, 
fhen fhe freafmenf means, are equal fo fhe overall mean, p, and fhe second ferm in fhe 
expecfafion of fhe TrfMS becomes zero; bofh mean squares fhen have fhe same expecfed 
value. This is fhe basis of fhe ANOVA F-fesf, which fesfs fhe null hypofhesis of equal freaf- 
menf populafion means, i.e. 



Hq:Pi = P2=...=P(, 

againsf fhe general alfernafive of some variafion wifhin fhe sef of freafmenf populafion 
means. The fesf sfafisfic is obfained by dividing fhe freafmenf mean square by fhe residual 
mean square, and is known as fhe variance ratio or observed F-statistic, which we denote 
as F, i.e. 



F = TrtMS/ResMS . 

If fhe null hypofhesis is frue, we expecf fhe value of fhe variance rafio fo be close fo one. A 
larger rafio implies fhaf fhe variafion befween freafmenf means is greafer fhan fhe back- 
ground variafion, and is evidence of differences among fhe freafmenf populafion means. 
More formally, if fhe assumpfions on fhe deviafions hold and fhe null hypofhesis is frue, 
such a rafio of fwo independenf mean squares has an F-disfribufion, and fhe amounf of 
evidence can be quanfified. The F-disfribufion depends on fwo sefs of degrees of freedom, 
one associafed wifh fhe numerator in fhe rafio (here, TrfMS wifh f - 1 df) and one associ- 
afed wifh fhe denominafor (here, ResMS wifh N -t df). For clarify, we somefimes specify 
fhe observed variance rafio wifh ifs df as subscripfs, for example, Fj.j ^_j. As for fhe fwo- 
sample f-fesf (Secfion 2.4.2), we usually have a pre-defermined level of significance for 
fesfing, denoted (usually = 0.05). This is a one-sided fesf, as only large variance rafios 
indicate differences among fhe populafion means. The crifical value required is fherefore 
fhe 100(1 - ajfh percenfile of fhe F-disfribufion, denoted which safisfies 

Prob(Ff_i,N_, > Ff'“(Vf) = “s, 

where Ff_i denofes a random variable wifh an F-disfribufion on (f - 1) and (N - t) df. 
Variance rafios larger fhan fhe crifical value give evidence (af significance level aj againsf 
fhe null hypofhesis. Tables of 95fh percenfiles for F-disfribufions wifh a range of numera- 
tor and denominafor df are provided in Appendix B, buf fhese are also available in sfa- 
fisfical soffware. Alfernafively, fhe observed significance level can be calculafed as fhe 
proporfion of fhe F-disfribufion greafer fhan fhe observed variance rafio, or 

P = Prob(F,_j > F,_i j^_f) . 

This calculafion requires fhe quanfile funcfion of fhe F-disfribufion, which is also avail- 
able in sfafisfical soffware. 



4.3.4 The Summary ANOVA Table 

All of fhese calculafions can be neafly summarized in fhe ANOVA table (see Table 4.4). A 
summary ANOVA table can be constructed for any linear model. The freafmenf sum of 
squares (TrfSS) is somefimes known as fhe 'befween' sum of squares, because if quanfifies 
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TABLE 4.4 

Structure of the ANOVA Table for a CRD with t Treatments (Factor Treatment) and N Observations 
in Total 



Source of Variation 


df 


Sum of Squares 


Mean Square Variance Ratio P 


Treatment 


t-1 


TrtSS 


TrtMS = TrtSS/(f-l) F = TrtMS/ResMS Prob(F,_i > F) 


Residual 


N-t 


ResSS 


ResMS = ResSS/(N-f) 


Total 


N-1 


TotSS 





variation between (among) treatment means. Similarly, the residual sum of squares (ResSS) 
is sometimes called the 'within' sum of squares, as if quanfifies a pooled measure of varia- 
fion wifhin freafmenf groups. Table 4.4 shows fhe full form of fhe one-way ANOVA fable. 
Where space is limifed, we will occasionally omif one or more of fhe columns, and you 
should nofe fhaf fhe columns holding fhe sums of squares and degrees of freedom are 
somefimes inferchanged. 

Prior fo inferprefing fhe oufpuf from any ANOVA, if is essenfial fo check fhaf fhe dafa 
do nof violafe any of fhe underlying assumpfions made abouf fhe deviafions (Secfion 
4.1), as fhe validify of fhe F-fesf depends upon fhem. In fhis confexf, you should nofe 
fhaf use of randomizafion in seffing up an experimenf increases fhe robusfness of fhe 
F-fesf - fhis is discussed furfher in Secfion 5.2.4. The frue values of fhe deviafions are 
nof known, buf we have fhe residuals (defined in Equafion 4.4) as esfimafes of fhese 
values. Validafion mefhods fhaf use residuals are known as diagnosfic fools and are 
described in Chapfer 5. Once fhe assumpfions have been checked and found fo be rea- 
sonable, wifhin cerfain limifs, we can fhen examine fhe resulfs from fhe ANOVA and 
begin fo draw conclusions. 

EXAMPLE 4.1D: CALCIUM POT TRIAL* 

The calculations in Example 4.1C can be combined to construct the ANOVA table in 
Table 4.5. Residual plots for these data can be seen in Section 5.2, where they are dis- 
cussed in some detail. For the moment, we merely state that the residual plots are rea- 
sonably consistent with the assumptions underlying the analysis. The 5% critical value 
of the relevant F-distribution (F|°j®') is 3.239 (Table B.l) and the 1% critical value (F;^“i®') 
is 5.292. Both of these values are considerably smaller than the observed variance ratio, 

F 316 = 10.753. The observed significance level can be obtained as the proportion of the 
F-distribution with 3 and 16 df greater than the observed variance ratio. In this example, 
this is P = 0.00041 (often reported as P < 0.001 in statistical software). Hence, we reject 
the null hypothesis and conclude that there is very strong evidence that the population 
means are not all equal, indicating that there is some effect of calcium concentration on 
root growth. The next step in the analysis is to interpret this effect. 



TABLE 4.5 

ANOVA Table for Root Lengths from the Calcium Pot Trial with Four Treatments (Factor Calcium) 
(Example 4.1D) 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


Calcium 


3 


2462.95 


820.98 


10.753 


< 0.001 


Residual 


16 


1221.60 


76.35 






Total 


19 


3684.55 
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Having constructed the ANOVA table, calculated the variance ratio, and compared 
this value to the appropriate F-distribution, we know whether there is evidence to sup- 
port rejection of the null hypothesis that all treatments have the same population mean. 
If we have a significanf resulf for fhe F-fesf, fhen we should examine fhe pafferns in 
fhe freafmenf sample means. If fhe F-fesf resulf is nof significanf, fhen fhe analysis is 
complefe and we do nof have evidence fo rejecf fhe null hypofhesis. If is offen worfh 
considering whefher a more complex freafmenf sfrucfure is presenf, requiring a more 
complicafed form of explanafory model (see Chapfer 8). If differences befween freaf- 
menf sample means appear large from a biological poinf of view buf fhe F-fesf is nof 
significanf, fhis may indicafe fhaf fhe experimenf was nof sufficienfly large or precise fo 
be able fo defecf fhese differences as sfafisfically significanf. This relafes fo fhe power of 
fhe experimenf, which can generally be increased by adding more replicafes across fhe 
whole experimenf. Approaches for defermining fhe replicafion level required fo defecf 
a given size of freafmenf difference for a given amounf of background variafion are 
described in Chapfer 10. 



4.4 Evaluating the Response to Treatments 
4.4.1 Prediction of Treatment Means 

If fhe fiffed model gives a good represenfafion of fhe observafions, fhen fhe besf predicfion of 
fhe populafion mean for fhe/fh freafmenf comes from fhe paramefer esfimafes (Secfion 4.2) as 

A; = Vi- , 

i.e. fhe besf predicfion of fhe freafmenf populafion mean is fhe freafmenf sample mean. 
If we rejecf fhe null hypofhesis, having obfained a significant F-test result, and we have 
checked the validity of our model assumpfions (Chapfer 5), fhen we can examine fhe fable 
of sample means fo idenfify fhe source(s) and size(s) of any freafmenf differences associ- 
afed wifh fhis significanf fesf resulf. If is imporfanf fo realize fhaf sfafisfical significance 
is nof fhe same as biological significance - wifh sufficienf replicafion, if is possible fo find 
sfafisfically significanf differences fhaf are foo small fo have any real biological meaning - 
and so if is imporfanf fo also consider fhe biological significance of any sfafisfically signifi- 
canf comparison. 

To make sfafisfical inferences abouf fhe freafmenf populafion means, we need a measure 
of fhe uncerfainfy associafed wifh our esfimafes of fhese values, fhe freafmenf sample 
means. This uncerfainfy is measured by fhe variance of fhese esfimafes, usually expressed 
on fhe SD scale (i.e. as fhe square roof of fhe variance) and commonly referred fo as fhe 
sfandard error of fhe mean. We use fhe general nof af ion SE(|r,) fo denofe fhe SE of such esfi- 
mafes. For fhe jth freafmenf wifh n, replicafe observafions, fhe variance of fhe esfimafed 
mean is (as infroduced in Secfion 2.2.3) and fhe SE is fhe square roof of fhis variance, 
i.e. 



SE(|I,) = 
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In fact, the true value of the background variation, a^, is unknown, and so we replace it 
with our best estimate, (equal to the ResMS), to get an estimate of this SE, written as 

SEM^ = SE(|i,) = 




In general, for simplicity, we use the notation SE to denote the estimate, SE, and use SEM^ 
to denote this SE for the estimate of the population mean for the ;th treatment. The sub- 
script can be dropped when all treatments are equally replicated and the SEMs are all 
equal. 

A 100(1 - aj% Cl for the population mean of the ;th treatment can be constructed as 
( A; - X SEM,] , A;+ [trA’x SEM,] ) , 



where is the 100(1 - as/2)th percentile of the Student's t-distribution with df equal to 
the residual degrees of freedom from the ANOVA (here N - t). The confidence limits can 
alternatively be expressed as 



A, + [tS?f ' X SEM,] . 



EXAMPLE 4.1E: CALCIUM POT TRIAL* 

From the ANOVA table obtained above (Example 4.1D), we have ResMS = = 76.35. The 

sample means for this study were shown in Table 4.1. Each treatment has five observa- 
tions, « = 5, so the standard errors of the predicted means are all equal to 



SEM 




76.35 
V 5 



= V15.27 = 3.91 . 



The 97.5th percentile of the t-distribution w ith 16 df is = 2.120. Using the estimate 
for treatment A obtained in Example 4.1B, Calciumi = 64.2, we calculate a 95% Cl for this 
treatment as 



Calciumi ± (ti^^' x SEM) = 64.2 + (2.120 x 3.91) = 64.2 + 8.29 , 



giving the 95% Cl as (55.9, 72.5). 



4.4.2 Comparison of Treatment Means 

Usually, one of the aims of a study is the comparison of the population means for differ- 
ent treatments. The best estimate of a difference in population means is the difference in 
sample means, for example, for the ;th and kth treatments 

A; ~ At “ 3//- ~ Vk- • 
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The standard error of this difference between the predicted population mean of the ;th 
treatment, with replication and the kth treatment, with replication n^. (also presented in 
Section 2.4.2) takes the form 



Again, we estimate this SE using in place of the unknown background variance, 
written as 



We thus use SED^j, to denote the SE for the estimated difference between the population 
means for the ;th and kth treatments. The subscripts can be dropped when all treatments 
are equally replicated and the SEDs are all equal with 



Note that the SED, the standard error of a difference between predictions, is always larger 
than the SEM for a single prediction because the SED contains uncertainty associated with 
the estimation of two population means. 

Under the null hypothesis that the population means for the two treatments are equal, 
i.e. Hq: p, = p,t, the statistic 



has a t-distribution with degrees of freedom equal to the residual df, here N -t. The statistic 
can be used to test this null hypothesis against the two-sided alternative Hj: py ^ p^^ by com- 
paring it with critical values of that t-distribution (Section 2.4.2). The differences between 
several pairs of treatment means can be evaluated in this way; however, in general, the test- 
ing of many pairwise comparisons without taking precautions with regard to overall levels 
of significance can give misleading results - we return to this topic in Section 8.8. 

Erom the form of this two-sample t-test, it follows that the smallest absolute difference 
between two treatment sample means that would result in a statistically significant two- 
sided t-test for the difference between the corresponding population means can be calcu- 
lated as 






h; - hfc ^ Vj. - yi. 

SEDyj: SEDyjt 



LSD,,i= tfcf'xSED^,, , 



where LSD^ denotes this least significant difference (LSD) between the jth and kth treat- 
ment means. A 100(1 - aj% Cl for the difference in population means between the ;th and 
kth treatments can be computed in terms of this LSD as 
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((|1^ - |10 - LSD,,;, , (|1, - A/c) + LSD,,t) . 

Because of the connection with the t-test, if the Cl for a difference between the population 
means for two treatments does not include zero, then there is statistical evidence for a sig- 
nificant difference (at level aj between the means for these two treatments. 

EXAMPLE 4.1E: CALCIUM POT TRIAL* 

For this study, we want to know whether a small increase in calcium (treatment B) has 
any impact on root growth compared with the standard (treatment A). To calculate the 
SED between these two treatment means requires the ResMS (s^ = 76.35) and residual 
df (ResDF = 16) from the ANOVA table (Table 4.5). Each treatment has five observations, 
n = 5, so there is also a common SED for comparing any pair of treatments, i.e. 

SED = = V30.54 = 5.53 . 



The tabulated 97.5th percentile of the t-distribution with 16 df is = 2.120, so at a 5% 
significance level, the LSD is calculated as 

LSD = X SED = 2.120 x 5.53 = 11.72 . 

Hence, any difference between treatment means greater than 11.72 is significant at the 
5% level. The predicted difference between treatments B and A can be obtained from 
Example 4.1B as Calciuniz - Calciunu = 75.8 - 64.2 = 11.6, slightly smaller than the LSD, 
and the 95% Cl for the difference between these treatments is 

(Qdcimnz - Qdc^ = (75.8 - 64.2) + 11.72 = 11.6 + 11.72, 

giving the 95% Cl as (-0.1, 23.3). As expected from the comparison between the treat- 
ment means with the LSD, zero is contained in this Cl, confirming the conclusion that 
these treatments are not statistically different at a significance level of 5%. 



It is often more informative to consider the overall pattern in the treatment means, as 
shown in Figure 4.6, rather than to make numerous pairwise treatment comparisons. 
Figure 4.6 indicates an initial increase in root growth as calcium increases, then a clear 
decrease at the highest level. The LSD bar indicates the likely size of differences caused by 
background variation. These results could be used to suggest rough limits on calcium con- 
centrations likely to be beneficial to root growth that could be verified using field studies. 
Given the numerical relationship between the calcium factor levels, we might also try to 
model the pattern by considering the calcium content as a quantitative explanatory vari- 
able (recall Examples 1.1 and 1.2), and we describe this type of model in Section 8.7. 

Note that although there is a superficial similarity between Figures 4.6 and 4.1, there 
are important differences between the two plots. Both presentations show the treatment 
sample means, but they differ in their presentations of uncertainty. Figure 4.1 shows 
only the unbiased sample SD for each treatment, giving some indication of the within- 
group variability for each treatment. This is useful as a precursor to ANOVA in assess- 
ing the assumption of homogeneity of variance (although formal tests can also be used. 
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FIGURE 4.6 

Estimated LSD and treatment means for the calcium pot trial (see Examples 4.1B and F). 



see Section 5.3). On the other hand. Figure 4.6 shows the LSD, based on an estimate of 
background variation pooled across the treatment groups, which gives a direct and more 
appropriate measure for sfafisfical comparison of freafmenf populafion means as a resulf 
of fhe ANOVA. 



4.5 Alternative Forms of the Model 

In the preceding sections, we have used the simplest form of the model for a single factor, 
i.e. 



y jk h; + ^jk / 

using a single parameter to represent the population mean for each treatment. Statistical 
packages generally use other forms, or parameterizations, of the model, and we give an 
introduction to these forms here. As long as the forms are equivalent, the same ANOVA 
table and the same conclusions will be obtained. One widely used form writes the single 
factor model as 



Vjk = h + X; + ejk , (4.6) 

where p is the overall population mean across all treatments, and Xy is the unknown popu- 
lation treatment effect for the jth group, i.e. the difference between the population mean 
for the jth treatment and the overall mean, which can be positive or negative. 

The explanatory component can then be written in symbolic form as 



Explanatory component: [1] + Treatment 
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Here, the term [1] denotes a factor that takes value 1 everywhere, and is associated with 
the overall mean, ji. 

This model has f + 1 parameters - one for each freafmenf group plus fhe overall mean. 
However, as fhe sfrucfure can be described wifh only t paramefers, i.e. one for each group, 
fhe version in Equafion 4.6 is called over-parameterized. The consequence of over-param- 
eferizafion is fhaf esfimafion of fhe paramefers by fhe leasf squares principle does nof 
resulf in a unique solufion, and so we impose consfrainfs fo obfain a unique solufion. To 
keep fhe inferprefafion of p as fhe overall mean, fhe average freafmenf effecf musf be zero; 
hence we impose fhe consfrainf = 0, i.e. fhe sum (and hence mean) of fhe freafmenf 
effecfs is zero (somefimes called fhe sum-to-zero constraint). The parameter estimates 
then take the form 



A = y . 

= Vi- - y > 

so fhe overall populafion mean p is esfimafed by fhe sample grand mean, and fhe popu- 
lafion freafmenf effecfs Xy are esfimafed by fhe differences befween fhe freafmenf sample 
means and fhe sample grand mean. The filled values fhen fake fhe form 

y.^ = |1 + Xj = y,. , 



and hence are equal fo fhe freafmenf sample means, exacfly as before. In fhis paramefer- 
izafion, we nofe fhaf fhe freafmenf sum of squares is simply a sum of squares of fhe esfi- 
mafed freafmenf effecfs, i.e. 



TrfSS = 






;=i 



;=i 



This parameferizafion is used in fhe GenSfaf ANOVA algorifhm. 

Ofher parameferizafions can be used, buf here we jusf consider variafions on one 
of fhe mosf common, offen called fhe corner-point constraint or first-level-zero con- 
straint, written as 



yjk = hi + V; + . 



In this parameterization, the constraint Vj = 0 is used. The term pj then represents the 
population mean for fhe firsf freafmenf, and fhe effecf Vy represenfs fhe difference befween 
fhe populafion mean for fhe ;fh freafmenf and fhaf of fhe firsf freafmenf. Here, fhe leasf 
squares esfimafes fake fhe form 



hi = yi- . 

Vy = Ay - yi- • 
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So the parameter pj is estimated by the sample mean for the first treatment, and the effects 
Vj are estimated by differences befween fhe sample means for fhe jth freafmenf and fhe 
firsf freafmenf. However, fhe fiffed values, calculafed as 

y,vc = Ai + V; = yj. , 

are again equal fo fhe freafmenf sample means, mafching previous forms of fhe model. 
This parameferizafion is used in fhe GenSfaf regression algorifhm (commands MODEL and 
FIT) and fhe Im and aov algorithms implemented in R. 

Another version of fhis parameferizafion uses last-level-zero constraints, taking the 
form 



Vjk = ltf + ®; + e;i . 

wifh consfrainf cOj = 0 , fhen fhe cOy, 7 = 1 ... f - 1 , represenf differences wifh fhe lasf freaf- 
menf. Again, alfhough individual paramefer esfimafes differ from previous forms, fhe fif- 
fed values are fhe same. This algorifhm is used by fhe PROC GLM algorifhm in SAS and can 
be used in fhe GenSfaf regression algorifhm (commands MODEL and FIT) if fhe lasf factor 
level is chosen as fhe reference level. 

All of fhe above forms are specific fo fhe linear model wifh a single facfor, and musf 
be extended for more complex models; fhe principles are fhe same however. Defails for 
more complex models are given in Ghapfers 7 , 8 and 11. In general, alfhough fhe value and 
inferprefafion of individual paramefer esfimafes depend on fhe parameferizafion used, 
fhe fiffed values and predicfions for populafion freafmenf means are unchanged. For fhis 
reason, we usually make inferences for fhe populafion means, which we sfill denote as Py, 
7 = 1 ... t, rafher fhan for fhe individual model paramefers. 

EXERCISES 

4.1* A glasshouse experimenf fo evaluafe confrol of a weed species by fhree differ- 
enf chemical freafmenfs used a GRD wifh seven replicafes of each freafmenf. 

a. Whaf are fhe null and alfernafive hypofheses for fhis experimenf? 

b. Gonsfrucf fhe ANOVA fable given fhaf TrfSS = 121.5 and ResSS = 87.4. 

c. Whaf is fhe appropriafe F-disfribufion for the variance ratio under the null 
hypothesis? What is the 5% critical value from fhis disfribufion? 

d. Would we accepf or rejecf fhe null hypofhesis? 

4.2 A laborafory experimenf invesfigafed fhe effecf of differenf freafmenfs on 
grain producfion in wheaf ears infecfed wifh Fusarium graminearum (Baldwin 
ef al., 2010). Single wheaf ears on 30 separafe planfs were inoculafed wifh 
F. graminearum. Four freafmenfs (labelled A-D) and a negafive (unfreafed) 
confrol were fhen allocafed fo fhe inoculafed ears as a GRD. The number of 
grains in fhe region above fhe inoculafion posifion of each ear was counfed. 
File GRAiNS.DAT confains fhe unif number (DEar), fhe freafmenf applied (facfor 
Treatment) and the number of grains (variate Grains) for each ear.* 
a. Write down a mathematical model for the numbers of grains. 



Data from K. Hammond-Kosack, Rothamsted Research. 
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b. Write down the null and alternative hypotheses associated with this 
experiment. 

c. Construct an ANOVA table by calculating the total, treatment and residual 
sums of squares and df and then deriving the other columns. Is there any 
evidence that grain production is affected by the treatments? 

d. Calculate the predicted mean for each freafmenf group and fhe SED and 
LSD for freafmenf comparisons. 

e. Sfafe your conclusions from fhis analysis. 

(We re-visif fhese dafa in Exercises 5.1 and 5.2.) 

4.3 An experimenf was done fo assess whefher fungal infecfion affecfed aphid 
reproducfion. Sixfy adulf aphids were equally divided among fhree freafmenf 
groups, which were eifher inoculafed wifh a fungus (eifher Beauveria bassiana 
or Pandora neoaphidis) or nof inoculafed. One firsf-generafion (EG) nymph was 
faken from each adulf; fhese nymphs were placed individually info Pefri dishes 
which were fhen arranged randomly wifhin a confrolled environmenf cham- 
ber. The developmenf fime for each EG aphid was observed and fhe number of 
nymphs produced by each EG aphid during a fime equal fo ifs own develop- 
menf fime was counfed. Some EG aphids died before producing nymphs and 
were removed from fhe experimenf. Eile fungus.dat confains fhe unif numbers 
{DAphid), fhe freafmenfs (factor Fungus) applied, and fhe numbers of nymphs 
(variate Nymphs) produced by fhe 33 remaining EG aphids.* 

a. Wrife down a mafhemafical model for fhe aphid counfs. 

b. Wrife down fhe null and alfernafive hypofheses associafed wifh fhis 
experimenf. 

c. Gonsfrucf an ANOVA fable by calculafing fhe fofal, freafmenf and residual 
sums of squares and df and fhen deriving fhe ofher columns. Is fhere any 
evidence fhaf reproducfion of EG progeny is affecfed by fungal infecfion of 
fhe original adulf aphids? 

d. Galculafe fhe esfimafed mean, wifh a 95% confidence inferval, for each 
freafmenf group. 

e. Sfafe your conclusions from fhis analysis. 

f. Gommenf on whefher omiffing the EG aphids that died might bias the 
results - what assumptions has your analysis made? 

(We re-visit these data in Exercises 5.1, 5.2 and 5.4.) 

4.4 An experiment investigated the effect of conidia densify on fransmission of a 
fungus fhaf affacks aphids. Gadavers of aphids killed by fhe fungus, and from 
which fhe fungus was releasing spores, were placed on bean planfs af fhree 
densifies (A = 1, B = 5 or G = 10 cadavers per planf) fo give differenf doses of 
fungal conidia. The densifies were allocafed fo individual bean planfs as a GRD 
wifh six replicates. Twenfy uninfecfed live aphids were placed on each planf 
wifh one ladybird which was allowed fo forage fo facilifafe franster of conidia 
befween fhe cadavers and fhe live aphids. Eor each planf, fhe proporfion of 
aphids fhaf became infecfed affer 7 days was recorded and franstormed fo fhe 
logif scale for analysis (see Ghapfer 6). The unif numbers {DPlant), freafmenf 



Data from J. Baverstock, Rothamsted Research. 
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allocations (factor Density) and transformed responses (variafe LogitP) are in 

file TRANSMISSION.DAT* 

a. Wrife down fhe null and alfernafive hypofheses associafed wifh fhis 
experimenf. 

b. Obfain the ANOVA table. Is there any evidence that the density of fungal 
conidia affecfs fhe rafe of fransmission of the fungus fo fhe aphids? 

c. Plof fhe predicfed means for each densify wifh fhe LSD. Whaf does fhis plof 
suggesf? 

d. Sfafe your conclusions from fhis analysis. 

(We re-visif fhese dafa in Exercise 5.2.) 

4.5 A variefy of maize was genef ically modified, and planf s were classified as homo- 
zygous, heferozygous or null according fo fhe number of glufamine mufanfs 
presenf (2, 1 or 0, respecfively; Haines, 2000). The dry weighfs (g) of single ker- 
nels from each of 10 planfs of each type, sampled at random, were recorded. 
The unit number (DKernel), classification (factor Type) and dry weight (variate 
Weight) for each kernel are in file maize.dat. 

a. Wrife down fhe null and alfernafive hypofheses associafed wifh fhis 
experimenf. 

b. Obfain fhe ANOVA fable. Is fhere any evidence fhaf mean kernel weighfs 
differ among fhe differenf genefic fypes? 

c. Plof fhe predicfed means for each genefic type with the LSD. What genetic 
hypothesis does this plot suggest? 

d. State your conclusions from fhis analysis. 

(We re-visif fhese dafa in Exercise 5.2.) 



Data from J. Pell, Rothamsted Research. 
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In Chapter 4, we introduced a simple additive linear model to describe the effects of a 
single explanatory factor. The ensuing sfafisfical analysis is based on fhe assumpfion fhaf 
fhis addifive form of model is frue, and also on fhe assumpfions abouf fhe properfies of fhe 
model deviafions given in Secfion 4.1. Conclusions from fhe sfafisfical analysis are valid 
only if fhese assumpfions are consisfenf wifh fhe properfies of fhe dafa. In fhis chapter, we 
describe some simple diagnosfic fools fhaf can be used to check fhe assumpfions underly- 
ing analysis of variance. 

The ferm diagnostic tools refers fo a collecfion of techniques used fo defecf inconsis- 
tencies befween fhe sfafisfical model and fhe dafa. Diagnosfic fools are used primarily 
fo check fhaf fhe assumpfions underlying fhe analysis are nof violafed and fhaf unusual 
individual dafa values do nof unduly affecf fhe fif of fhe model and hence any infer- 
ences drawn from if. If you find problems, fhen you can fake correcfive measures, such 
as fransforming fhe dafa (see Chapfer 6). Diagnosfics for linear models fake fwo main 
forms: analysis of fhe properfies of fhe residuals and compufafion of influence sfafisfics. 
In fhis chapfer, we concenfrafe on fhe former; influence sfafisfics, and relafed diagnosfic 
fools more relevanf fo regression models, are presenfed in Chapfer 13. However, all of fhe 
diagnosfic fools discussed in fhis chapfer are applicable fo assess fhe assumpfions for any 
linear model. 

Firsf, we describe fwo of fhe mosf commonly used forms of residual (Secfion 5.1). We 
fhen discuss graphical diagnosfic fools (residual plofs) for inspecfing fhe residuals and 
checking fhe model assumpfions (Secfion 5.2). We also briefly describe permufafion fesfs, 
which can be used when fhe assumpfion of a Normal disfribufion for fhe deviafions is 
nof plausible (Secfion 5.2.4). We fhen describe one formal fesf, Barfleff's fesf, for checking 
homogeneify of variances befween freafmenf groups (Secfion 5.3). We end fhis chapfer 
wifh a shorf discussion of how fo idenfify and deal wifh oufliers (Secfion 5.4). 



5.1 Estimating Deviations 

To examine whether the assumptions made about the deviations are plausible, we 
would ideally like to examine them directly. As this is not possible (because they are not 
known), we examine estimates of the deviations, called the residuals. Unfortunately, 
even if the assumptions underlying the model are true, the statistical properties of the 
residuals are not exactly the same as those of the deviations. For this reason, different 
types of residuals have been developed to examine different aspects of the distributions 
of deviations. Here, we describe simple and standardized residuals; later, in Chapter 13, 
we introduce prediction and deletion residuals, which are particularly useful in regres- 
sion analysis. 
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5.1.1 Simple Residuals 

In Equation 4.4, a residual from the model fitted for a CRD was defined as the discrep- 
ancy between the response and the fitted systematic component. This is the definition of 
a simple (or ordinary) residual, and can be easily extended for any linear model. Using 
a general notation that labels observations by index i for i = l ... N, we define the simple 
residual for the ith observation as 



e.' = Vi - h ’ 

where y, is the response and y, is the fitted value for fhat observafion. As in previous chap- 
fers, the 'hat' notation denotes an estimate; the residuals C; are estimates of the unknown 
deviations e,. In this general notation, the subscript i refers fo the zth observation, but recall 
that it is often convenient to relabel the units to reflect the structure of a specific dafa set, 
for example, to use for a CRD fo represent the simple residual for the kth replicate of the 
yth treatment, as in Equation 4.4. 

EXAMPLE 5.1A: CALCIUM POT TRIAL* 

Recall the calcium pot trial analysed in Example 4.1 (data in file calcium.dat). Figure 
5.1a shows the fitted values, here the treatment means, and the observed responses plot- 
ted against treatment group. The value of the simple residual for each observation is 
equal to the vertical distance between the observation and its group mean, indicated 
for the second smallest response in treatment A by a dotted line. Observations larger 
than the group mean have positive residuals; those smaller than the group mean have 
negative residuals. Figure 5.1b shows the simple residuals plotted by treatment group. 

The residuals show the same pattern as the observations within each treatment, but the 
groups are now centered about zero rather than about the treatment means. There is a 
suggestion in Figure 5.1b that the variance differs between treatments; this is investi- 
gated further in Section 5.3. 





FIGURE 5.1 

(a) Observed (o) and fitted (•) values (root lengths, cm) from the calcium pot trial (Example 5.1A). (b) Simple 
residuals for all pots in the calcium pot trial. Vertical dotted line indicates the simple residual for the second 
smallest response in treatment A. 
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For a model with a single explanatory factor, such as the CRD, the residuals within each 
group must sum to zero. Recall that in the CRD we label observations yjichy their treatment 
group (index j) and replicate (index k). From Equation 4.4, we know that the residuals take 
the form 



£jk - Vjk- Vj- / 



where pj. is fhe sample mean for fhe ;fh freafmenf group. The sum of fhe residuals in fhis 
group is fherefore calculafed as 



k=\ 



y,-) = 



k=l 






Vi- - Vi- = 0 • 



in ofher words, fhe residuals sum fo zero because fhe sum of fhe observafions is equal 
fo fhe sum of fhe fiffed values. Summing fo zero is a consfrainf fhaf means fhaf residuals 
wifhin freafmenf groups are nof independent even when the model deviations are truly 
independent. Similar constraints across treatment groups and other structures also hold, 
even for more complex models. The sum of fhe complefe sef of residuals across fhe experi- 
menf musf also be equal fo zero. 

A general expression for fhe variances of residuals is given in Secfion 13.4.2. For fhe 
CRD, fhe variance of fhe residuals is direcfly relafed fo fhe replicafion wifhin each freaf- 
menf group, as 



Var(e^fc) = 




y 



The variances of fhe residuals are fherefore all equal for fhe CRD when fhe freafmenf 
groups have equal replicafion, buf differ befween freafmenf groups ofherwise. As usual, 
we can esfimafe fhis quanfify by replacing fhe unknown variance, a^, by ifs esfimafe 
= ResMS (Secfion 4.3). 



5.1.2 Standardized Residuals 

Standardized residuals are used to deal with the problem of unequal variances wifhin a 
sef of simple residuals. A sfandardized residual is defined as fhe simple residual divided 
by an esfimafe of ifs sfandard error. We denofe fhe sfandardized residual here as r„ wifh 



h = 



Sj 

sE(a,) 



where SE(e,) is fhe esfimafed sfandard error of fhe simple residual for fhe ffh observafion 
(see Secfion 13.4.2 for defails). The sfandardized residuals have a common variance equal 
fo one (unif variance), buf are nof independenf as fhey are subjecf fo fhe same consfrainfs 
as fhe simple residuals. Eor reasonably large dafa sefs, mosf of fhe sfandardized residuals 
should fall wifhin fhe range +2, and individual poinfs oufside fhis band may be invesfi- 
gafed as pofenfial oufliers (see Secfion 5.4). 
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For the CRD, the standardized residuals (labelled by treatment group j and replicate 
number k) are thus 



where s is the square root of the residual mean square. When all groups have equal replica- 
tion, so all Wy are equal to a common value n, the set of sfandardized residuals are simply a 
scaled version of fhe sef of simple residuals. 

Nofe fhaf fhere is no common nomenclafure for residuals: fhe sfandardized residuals 
defined here are called 'infernally Sfudenfized residuals' or 'Sfudenfized residuals' in 
some sfafisfical fexfs and soffware. You should fherefore make sure you undersfand fhe 
definifion of fhe residuals being used in any confexf. 



5.2 Using Graphical Tools to Diagnose Problems 

Examinafion of fhe disfribufion of residual values is an essenfial sfep in fhe fiffing of 
any linear model. The fwo fypes of residuals described above (simple and sfandardized) 
are bofh useful for defecfion of general inadequacies in fhe model and violafions of fhe 
assumptions. In the case of designed experimenfs wifh equal replicafion, fhe simple resid- 
uals are proporfional fo fhe sfandardized residuals, and eifher sef can be used. However, 
if is usually more appropriafe fo use fhe sfandardized residuals, as fhese have fhe advan- 
fages of a sfandard scale and common unif variance. In fhis secfion, we concenfrafe on a 
few of fhe mosf commonly used graphical procedures (usually called residual plots) for 
checking the validity of fhe mosf imporfanf assumpfions (homogeneify of variance, inde- 
pendence and Normalify) underlying a filled linear model. We describe several forms of 
residual plof, offen used in combinafion fo provide an overall picfure of fhe validify, or 
ofherwise, of fhe model. 

5.2.1 Assessing Homogeneity of Variances 

The assumpfion fhaf all deviafions have equal variance (Assumpfion 2 of Secfion 4.1) is 
offen called fhe assumpfion of homogeneify of variances (or homoscedasficify). If fhe dafa 
conform fo fhis assumpfion, a plof of fhe sfandardized residuals, r„ againsf fhe filled val- 
ues, y, (usually called a fitted values plot), should show approximately equal variance 
(indicated by the vertical spread about zero) across the range of filled values (e.g. Figure 
5.2a). A varianf of fhe filled values plof (called an absolute residuals plot) replaces the 
residuals with their absolute values, |r,|. This plot should also show approximately equal 
variance across the range of filled values (e.g. Figure 5.2b). A smoofh frend line can be 
added fo bofh plofs fo emphasize any pattern. 

One common deparfure from consfanf variance occurs where fhe spread of fhe residuals 
increases as fhe fitted values gel larger (e.g. Figure 5.3). This pattern is offen seen for dis- 
crefe dafa, such as counfs, or confinuous dafa, such as weighfs, where fhe dafa span a large 
range. For dafa in fhe form of counfs as a percenfage of a fixed fofal, fhe variance is usually 
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FIGURE 5.2 

(a) Fitted values plot (residuals against fitted values) and (b) absolute residuals plot (absolute value of residuals 
against fitted values) with trend line, both showing acceptable homogeneity of variance. 



small where the fitted value is close to either 0 or 100%, increasing towards a maximum 
at the centre of fhe range (50%). Occasionally, variances may differ sysfemafically among 
freafmenf groups in a manner unrelafed fo fheir fiffed values; fhese differences are more 
difficulf fo defecf from a fiffed values plof buf may be fesfed formally by Barfleff's fesf, as 
described in Secfion 5.3. 

The paffern in Figure 5.3 mighf be removed by fhe applicafion of a variance-sfabilizing 
fransformafion of fhe response, and Chapfer 6 deals wifh fhis topic in defail. Alfernafively, 
we mighf re-evaluafe fhe assumpfion of Normalify and decide fhaf a differenf probabilify 
disfribufion would be more appropriafe and use fhe mefhods presenfed in Chapfer 18. In 
general, any sfrong paffern in fhe spread of fhe residuals, whefher symmefric or asym- 
mefric, suggesfs a failure fo meef fhe assumpfion of homogeneous variances. In regression 
modelling, fhe fiffed values plof can also be used fo defecf sysfemafic deviafions of fhe 
model from fhe paffern in fhe response (see Secfion 13.1). 




FIGURE 5.3 

(a) Fitted values plot and (b) absolute residuals plot with trend line, both showing a strong pattern of larger 
variance for larger fitted values. 
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Fitted value Fitted value 



FIGURE 5.4 

(a) Fitted values plot and (b) absolute residuals plot with trend line, for the calcium pot trial (Example 5.1B). 



EXAMPLE 5.1B: CALCIUM POT TRIAL* 

Figure 5.4 shows the fitted value and absolute residuals plots (based on the standardized 
residuals) for total root length. The fitted values for this model correspond to the sample 
means of the four treatments, and so there are four columns of points on the graph. 
There is no strong pattern in the spread of the residuals across the fitted values. 



5.2.2 Assessing Independence 

Usually, the assumption of independent deviations (Assumption 3 of Section 4.1) can 
safely be made given knowledge of the experimental procedure. For example, when we 
select a random sample of individual plants from locations in a large field and measure 
the height of each plant, we can reasonably assume that the deviation from the mean for 
any particular plant has no association with that for any other plant. However, there are 
situations when dependence (or correlation) among deviations arises. For example, sup- 
pose plants were sampled in pairs from locations in a field; owing to local environmental 
conditions, we might expect the deviations within a pair to be correlated. Similarly, if the 
plants are processed by a machine that shows drift in measurements over time, this drift 
might be observed as a positive correlation in deviations from consecutive measurements. 
In general, proximity in location or time of measurement can provide a mechanism for 
correlation among deviations. 

Again, the distribution of the residuals can be used to investigate departures from this 
assumption of independent deviations. However, we stated above that the set of residuals 
are not independent, even if the deviations are independent. Since the induced correla- 
tion among residuals occurs within treatment groups, or within blocks, it should not be 
associated with trends in time or space (the usual sources of correlation) in a randomized 
experiment, and so the exploratory graphs described here should still reveal any strong 
serial (spatial or temporal) correlation. 

The first step in detecting correlation is to order the residuals by the suspected source 
of correlations, usually location or time. This order will often correspond to the physical 
layout or processing of the experiment, and so requires full details of the design to be 
recorded and stored as part of the data set. We denote a variable that gives this order as 
the index variable. Note that in some contexts there may be several index variables. For 
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FIGURE 5.5 

Index plots showing (a) positively and (b) negatively correlated standardized residuals when plotted against the 
time order in which the responses were obtained. 



example, in a two-phase study that consists of a field experimenf followed by laborafory 
processing, bofh fhe layouf of fhe field plofs and fhe order of processing in fhe labora- 
fory mighf be used as index variables. Dependence, or correlafion, of deviafions may be 
defecfed graphically wifh an index plot, where the residuals ordered by the index variable 
are plotted against the values 1 ... N. Positive correlation is demonstrated by a tendency 
for adjacenf residuals fo have similar magnifude and sign (e.g. Figure 5.5a), and negafive 
correlafion is indicafed by alfemafing signs (e.g. Figure 5.5b). 

Alfernafively, each residual can be ploffed againsf fhe residual immediafely preceding if 
when ordered by fhe index variable. Correlafion in fhe residuals would fhen be revealed 
by evidence of a frend in fhe graph, wifh a posifive slope indicafing a posifive correlafion 
(e.g. Figure 5.6a) and a negafive slope indicafing a negafive correlafion (e.g. Figure 5.6b), 
rafher fhan a random scaffer over all four quadranfs of fhe graph. 

Somefimes correlafion befween observafions will purposely be incorporafed info a 
sfudy. For example, fo examine fhe effecf of a growfh regulator on fhe heighf of planfs 




(b) 




Previous residual (r-_j) 



Previous residual (r-_p 



FIGURE 5.6 

Standardized residuals (rj, ordered by the index variable, plotted against the previous value in the series (r/A 
for (a) positively and (b) negatively correlated residuals. 
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over time, we might measure the same set of plants on consecutive days. The data then 
contain several observations for each experimenfal unif which will show posifive correla- 
fion; a plan! fhaf is faller fhan average af fhe firsf measuremenf is very likely fo sfill be 
faller fhan average af fhe second measuremenf. This fype of dafa is offen called repealed 
measuremenfs or longifudinal dafa, and is closely relafed fo lime series dafa (see e.g. 
Diggle ef al., 2002). A similar sifuafion occurs if dafa are relafed in space, for example, if 
measuremenfs are made af differenf disfances along a fransecf (e.g. along a river or info a 
field). While fhe index plof is useful for defecfing such correlafion, formal analysis of such 
dafa usually requires a more complex approach fhaf fakes accounf of fhe spafial or fem- 
poral correlafion. There are several formal sfafisfical fesfs for defecfing serial (aufo)cor- 
relafion in fhe deviafions (i.e. correlafion befween adjacenf deviafions in space or fime), 
of which fhe mosf well known is fhe Durbin-Wafson fesf. The discipline of geosfafisfics 
provides some more modern diagnosfics; see Websfer and Oliver (2007) or Chiles and 
Delfiner (2012). 

If femporal or spafial correlafion is expecfed prior fo experimenfafion, fhen if should be 
incorporafed af fhe design sfage. For example, if fhe correlafion is associafed wifh machine 
driff, fhen all samples wifhin a block should be processed fogefher fo confound differ- 
ences befween blocks wifh differences in larger-scale fime effecfs. However, if fhere is sfill 
evidence of sfrong correlafion wifhin fhe sef of residuals fhen fhe besf solufion is fo model 
fhis correlafion formally. This requires fhe use of more sophisficafed fechniques (e.g. lin- 
ear mixed models) which we briefly infroduce in Chapfer 16, and which are described 
furfher in fhe references suggesfed above. 



EXAMPLE 5.1C: CALCIUM POT TRIAL* 

Figure 5.7a shows the index plot of standardized residuals for total root length, where 
the index variable is the pot number (variate Pot), which defines the order in which 
the experimental units were arranged and measured. There is no indication of strong 
positive or negative correlation. This is supported by Figure 5.7b where each residual is 
plotted against the residual for the preceding pot; there is a slight suggestion of negative 
correlation, but the points lie in all four quadrants with no strong trend. 





Previous residual 



FIGURE 5.7 

(a) Index plot of standardized residuals, and (b) plot of standardized residuals versus previous residual (with 
order defined by pot number), for the calcium pot trial (Example 5.1C). 
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5.2.3 Assessing Normality 

The assumption that the deviations arise from a Normal distribution (Assumption 4 of 
Section 4.1) underlies the validity of the F-test for detecting differences between treatment 
means or evaluating the effect of a specific explanatory variable in a linear model. Note 
that this assumption is required only for tests of significance or computation of confidence 
intervals and not for parameter estimation, and, in fact, that the tests associated with the 
ANOVA are reasonably robust to departures from Normality, especially for randomized 
designs (see Section 5.2.4). 

If the deviations arise from a Normal distribution, then the residuals inherit this distri- 
bution, so it is natural to examine the residuals in this context. To ensure that the resid- 
uals have common variance, we use standardized residuals. We consider two types of 
residual plot for assessing the validity of this assumption: histograms (see Section 2.2) of 
fhe residuals and Q-Q (quantile-quantile) plots. If the Normality assumption holds, then 
a histogram of the residuals should exhibit an approximately symmetrical, bell-shaped 
distribution centered around zero (e.g. Figure 5.8a). The standard Q-Q plot displays the 
ordered residuals plotted against the quantiles of the proposed probability distribution. 
Several standard variations are used, for example, plotting the zth smallest residual against 
the 100(z - 0.375)/(n -i- 0.25)th percentile of the proposed distribution. As we wish to assess 
whether our residuals arise from a Normal distribution, we use quantiles from a standard 
Normal distribution (with zero mean and unit variance), and the resulting plot is also 
called a Normal plot. If the residuals are consistent with a sample from a Normal distribu- 
tion, then the plot should yield an approximately straight line passing through the origin. 
The slope of this line is determined by the standard deviation of the residuals, and so a 
Normal plot of standardized residuals should lie on the 1:1 line (e.g. Figure 5.8b). 

A skewed distribution of the residuals (e.g. a few very large positive residuals with a 
corresponding increase in the number of small negative residuals, or vice versa) results in 
a curved pattern, probably not passing through the origin. Distributions of residuals with 
fatter or thinner tails than a Normal distribution (i.e. with more or fewer large residuals, 
respectively), result in the relationship deviating from a straighf line towards the extremes 
(both positive and negative). 




FIGURE 5.8 

Checking Normality: (a) histogram of standardized residuals, and (b) Normal (or Q-Q) plot based on standard- 
ized (std) residuals with 1:1 line ( — ). 
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The half-Normal plot is a variation in which the ordered set of the absolute values of 
fhe residuals are plotted againsf quanfiles of fhe sfandard Normal disfribufion. In fhis 
case, fhe ifh smallesf absolufe residual is plotted againsf fhe [50 + 50 (i - 0.375)/(n + 0.25)]fh 
percenfile. If fhe residuals are consisfenf wifh a Normal disfribufion, fhe plof should again 
show an approximafely sfraighf line sfarfing af fhe origin. 

Nofe fhaf formal sfafisfical disfribufion fesfs, such as fhe Kolmogorov-Smirnov or 
Anderson-Darling fesfs, are nof sfricfly appropriafe here as fhey make an assumpfion of 
independenf observafions which is nof obeyed by fhe sef of residuals. We cannof fherefore 
recommend fhe use of fhese fesfs in fhis confexf. 

There are several approaches for dealing wifh non-Normalify of fhe residuals. One is fo 
use a non-paramefric permufafion fesf as an alfernafive fo fhe F-fesf (see Secfion 5.2.4 for 
more defails). Skewness can somefimes be correcfed by an appropriafe fransformafion of 
fhe dafa (see Chapfer 6). In fhis sifuafion, fhe shape of fhe histogram of fhe residuals (e.g. in 
terms of fhe amounf of skewness displayed) may give clues as fo possible franstormafions 
of fhe response. Alfernafively, where fhe form of fhe dafa suggesf fhaf if is unlikely fhaf a 
Normal disfribufion can be assumed (e.g. for discrete dafa including counfs or counfs as 
a percenfage of a fixed fofal), more advanced fechniques based on ofher probabilify dis- 
fribufions (e.g. Poisson or Binomial) for fhe response variable can be used insfead. These 
analyfical approaches are discussed briefly in Chapfer 18. 



EXAMPLE 5.1D: CALCIUM POT TRIAL* 

It is often useful to consider a set of residual plots together, and Figure 5.9 shows a com- 
posite display for the standardized residuals, consisting of the fitted values and abso- 
lute residuals plots from Figure 5.3 with a histogram of residuals and a Normal plot. 

In this case, the fitted values plot (Figure 5.9a), the absolute residuals plot (Figure 5.9b) 
and the histogram (Figure 5.9c), while not perfect, do not indicate serious departures 
from homogeneity of variances or from Normality. In the Normal plot (Figure 5.9d), the 
residuals follow an approximately straight line and it appears that the data are consis- 
tent with the assumptions underlying the linear model. 



5.2.4 Using Permutation Tests Where Assumptions Fail 

Where residual plots indicate departures from Normality (but variances are acceptably 
homogeneous) an alternative to the F-distribution for assessing the significance of the size 
of an observed variance ratio is provided by a permutation test. In the simplest case of an 
unstructured sample, for example, the CRD, the observed data are randomly re-allocated 
to the units (or equivalently, the unit labels are randomly rearranged with the data values 
remaining fixed) and the analysis is repeated on the permuted data to obtain a new value 
of the test statistic, here the variance ratio. If there are no treatment differences, then the 
test statistic should take a similar value (subject to variations due to sampling) for any 
permutation. This permutation procedure is repeated many times to provide an empirical 
reference distribution for the test statistic formed under the null hypothesis of no treat- 
ment differences, against which the actual observed test statistic can be compared. The 
probability for the test is computed as the proportion of permutations in which the test 
statistic is more extreme (defined appropriately for one- or two-sided tests) than the origi- 
nal observed test statistic. An exact (exhaustive) test can be made if the number of possible 
permutations is small, but in general the test is evaluated for a large subset of random per- 
mutations (often 999 to provide a three-digit significance level). Note that where structure 
is present, the permutation procedure must take it into account. 



Checking Model Assumptions 



103 






FIGURE 5.9 

A composite set of residual plots based on standardized (std) residuals for the calcium pot trial (Example 5.1D). 
(a) Fitted values plot, (b) absolute residuals plot, (c) histogram of residuals, and (d) Normal plot. 



Permutation tests are a form of non-paramefric fesf, i.e. a fesf for which no probabilify 
disfribufion is assumed, and can be derived for mosf hypofhesis fesfs. Permufafion fesfs 
also have a connecfion wifh fhe validify of fhe F-fesfs derived from an ANOVA fable. If 
variances are similar across groups and fhe residual df are large, fhen fhe F-disfribufion 
used in ANOVA gives a good approximafion fo fhe disfribufion of fhe permufafion fesf 
sfafisfic under fhe null hypofhesis (for more defails see e.g. Box, Hunfer and Hunfer, 1978). 
If follows fhaf if freafmenfs have been allocafed fo experimenfal unifs af random, fhen 
under fhe null hypofhesis, fhe observed F-sfafisfic can be considered as a random sample 
from fhe permufafion disfribufion. The F-disfribufion fhus approximafes fhe correcf refer- 
ence disfribufion wifhouf fhe need for disfribufional assumpfions. Nofe fhaf fhis reason- 
ing follows only for properly randomized designs (i.e. experimenfal sfudies), and does nof 
apply fo many observafional sfudies. 

5.2.5 The Impact of Sample Size 

Inferprefafion of fhe residual plofs can be somewhaf subjecfive, and properfies of fhe resid- 
uals are much easier fo assess visually when sample sizes are large. Take care fherefore 
when dealing wifh small dafa sefs and, in fhese sifuafions, a more lenienf approach fo 
inferprefing residual plofs is generally allowed (alfhough remember fhaf fhe assumpfions 
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FIGURE 5.10 

Fitted values plots for four data sets, each simulated as a CRD with four equally replicated treatments with 
Normal deviations and total number of observations equal to (a) 100, (b) 48, (c) 24 and (d) 12. 



must still be met for the analysis to be valid). Conversely, a stricter approach should usu- 
ally be taken for larger data sets. 

To illustrate the effects of sample size, we have simulated data from a CRD with four 
equally replicated treatment groups, with replication n = 25, 12, 6 and 3, i.e. with a total of 
N = 100, 48, 24 and 12 observafions, respectively. The deviations were generated to obey all 
the assumptions underlying the model, and the true population means for fhe treatment 
groups took values 10, 13, 15 and 22. The estimated treatment means show some variation 
about these true values, as expected. Figures 5.10, 5.11 and 5.12 show the fitted values plots, 
histograms of residuals and Normal plots, respectively, based on standardized residu- 
als from ANOVA for each of the four simulated data sets. Although each of fhese data 
sets obeys the assumptions of homogeneity of variance, independence and Normality, the 
resemblance between the residual plots and the ideal patterns clearly gets worse as the 
sample size decreases. 



5.3 Using Formal Tests to Diagnose Problems 

It is often difficult to judge by eye whether variances are similar across treatment groups, 
either in residual plots (e.g. Figure 5.1), or in bar charts of freatmenf means including 
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Standardized residual 



Standardized residual 




Standardized residual 




FIGURE 5.11 

Histograms of residuals for four data sets, each simulated as a CRD with four equally replicated treatments 
with Normal deviations and total number of observations equal to (a) 100, (b) 48, (c) 24 and (d) 12. 



within-group standard deviations (e.g. Figure 4.1). From both of these graphs, it appears 
that the variation within treatment D is smaller than that within the other treatments. 
However, the sampling variation in estimates of variances can be large, especially for small 
dafa sefs, and so if is often sensible to test formally whether such a set of sample variances 
could plausibly have arisen from a populafion wifh a common variance (buf allowing for 
differenf group means). Here, we give defails of Barfleff's fesf (Barfleff, 1937; Snedecor and 
Cochran, 1989), buf ofher fesfs (e.g. the F^^^-test of Harfley, Sheffe-Box fesf, Levene fesf) can 
also be used (Sokal and Rohlf, 1995). 

Barfleff's fesf is based on the assumption that data have arisen as a number of samples 
from Normally disfribufed populafions. The number of samples corresponds fo fhe t dif- 
ferenf treatmenf groups in the CRD. The null hypothesis of fhe fesf is fhaf the population 
variances for the different treatment groups are all equal, and the alternative hypothesis is 
that these population variances are not all equal. The test statistic is based on a comparison, 
on the logarithmic scale, of fhe average of fhe treafmenf sample variances wifh a pooled 
variance esfimafe. To construe! these sample variances algebraically, it is convenient to label 
observations as where index; labels the treatment groups (/ = 1 . ■ ■ t) and k labels the rep- 
licate values within treatments (k = l ... nj), i.e. the same labelling as for the CRD (Chapter 4). 
The unbiased sample variance for fhe/fh treafmenf is then calculated (as in Section 2.1) as 
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Normal quantile 



Normal quantile 



FIGURE 5.12 

Normal plots based on standardized (std) residuals for four data sets, each simulated as a CRD with four 
equally replicated treatments with Normal deviations and total number of observations equal to (a) 100, (b) 48, 
(c) 24 and (d) 12. 

where is the sample mean for the ;th treatment. The pooled variance estimate, denoted 
Spooled / is calculated as 



1 

Spooled — (M; “ 1) Sj , 



(5.1) 



i.e. as a weighted sum of the unbiased treatment sample variances, and is equal to the esti- 
mate of (the ResMS) from the CRD analysis (Section 4.3). The test statistic, X^, is 



^ 



(1 + c) 

where c is a scaling factor calculated as 

1 



l 

(V ~ f)loge(Spooled) ~ (^; ~ l)i^§e(S/ ) 



7=1 



C = 



3(f-l) 



■ 


f t ^ 


■ 






1 




^ (n, - 1) 
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For groups with equal replication, i.e. Wy = n for j = 1 ... t, these expressions simplify fo 






(w-1) 

(1-tc) 



f lo§e(^’pooled) ^^loge(^’7 ) 



wifh c = 



f + 1 

3f(n - 1) ■ 



Under fhe null hypofhesis, ~ %?_i, i.e. X^ has an approximafe Chi-squared disfribufion 
wifh f - 1 df (Secfion 2.2.4). Inequalify of sample variances is indicafed by larger values of 
fhe fesf sfafisfic, so a one-sided fesf is appropriafe. So, if X^ is larger fhan fhe 100(1 - ajfh 
percenfile of fhis chi-squared disfribufion, fhere is evidence (af significance level aj fhaf 
fhe variances differ befween freafmenf groups. 

If dafa show evidence of unequal variances across freafmenf groups, and fhe variances 
also change sysfemafically wifh fhe freafmenf means (as seen in fhe fiffed values plof), 
fhen a fransformafion mighf resolve fhe issue (see Chapfer 6). Alfernafively, fhe fesf may 
reflecf non-Normalify of fhe deviafions, and a differenf probabilify disfribufion mighf be 
considered (see Chapfer 18). If fhe deviafions appear fo follow a Normal disfribufion buf 
fhe paffern in fhe variances is nof relafed in a simple manner fo frends in fhe freafmenf 
means, fhen a weighfed analysis can be used fo accounf for fhe differenf variances associ- 
afed wifh differenf freafmenf groups. More defails abouf fiffing weighfed linear models 
can be found in Rawlings ef al. (1998). 



EXAMPLE 5.1E: CALCIUM POT TRIAL* 

The unbiased sample variances for each treatment in the calcium pot trial were given 
in Table 4.1 as si = 135.20, si = 45.20, Sc = 105.70, si, = 19.30, with equal replication of 
n = 5 for all treatments. The range of variances seems large, but each estimate is based 
on only five observations. The pooled variance estimate is calculated with Equation 5.1 
as 



sided = —(135.20 + 45.20 + 105.70 + 19.30) = = 76.35 . 

16 4 



As expected, this is equal to the ResMS from the ANOVA table in Example 4.1C. The 
scaling factor c is equal to 



- ^ + 1 
“ 3t{n - 1) 



= 0.1042 , 

3x4x4 



and so 



16 X loge(76.35) - 4 X ( loge(135.20) + loge(45.20) + loge(105.70) + loge(19.30)) 

1 + 0.1042 



3.633 . 



The 95th percentile of the chi-squared distribution with 3 df is 7.815 and so the test 
statistic is consistent with the null hypothesis of equal population variances across 
treatments. 
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5.4 Identifying Inconsistent Observations 

An outlier is an observation that is in some way inconsistent with the rest of the data set. 
In the context of designed experimenfs wifh replicafion, an ouflier is usually an observa- 
fion fhaf is inconsisfenf wifh fhe ofher observafions in a freafmenf group. Here, we use 
fhe ferm ouflier fo describe poinfs wifh large residuals from fhe fiffed model; fherefore, 
fhe idenfificafion of a poinf as an ouflier may change wifh fhe proposed model. Residual 
plofs can be used fo idenfify pofenfial oufliers, and many sfafisfical packages aufomafi- 
cally idenfify observafions wifh large residuals as pofenfial oufliers. 



EXAMPLE 5.2: DISEASE PROGRESS 

An experiment, designed as a CRD, investigated disease progression within leaves of 
oilseed rape plants. The amount of pathogen DNA extracted from leaves of inoculated 
plants in 11 lines of oilseed rape was measured for 12 replicate plants of each line. The 
logiQ-transformed DNA values were analysed. Figure 5.13 shows a composite set of 
residual plots based on the standardized residuals. One observation clearly stands out 





Fitted value Fitted value 




Standardized residual Normal quantile 



FIGURE 5.13 

Composite set of residual plots based on standardized (std) residuals from an experiment to measure disease 
progress within leaves (Example 5.2). (Data from Y. Huang, Rothamsted Research.) 
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as having a much larger (negative) residual and should be investigated as a possible out- 
lier. Apart from this one observation, the residual plots are otherwise satisfactory: the 
histogram is symmetric, the variances are similar across the treatments (range of fitted 
values) and the Normal plot shows a straight line. 



Outliers may be ascribed to several different sources: (1) problems with the experimental 
procedure; (2) errors in the recording, transcription or data input procedures; (3) an incor- 
rect or incomplete model specification; and (4) a genuine observation that is incompatible 
with the rest of fhe observafions. 

The firsf sfep in dealing wifh an ouflier is fo fry fo defermine ifs origin. Laboratory 
notebooks may reveal where problems wifh fhe experimenfal procedure were experienced 
or suspected. For example, fhe effecfiveness of fhe inoculafion on fhe ouflying planf in 
Example 5.2 mighf be in quesfion. Maintenance records can be consulted fo check fhe 
calibrafion of equipmenf. Original dafa records should always be cross-checked fo defecf 
errors: common errors include fhe fransposifion of digifs (e.g. 54.2 becomes 45.2) and 
movemenf of fhe decimal poinf (e.g. 54.2 becomes 542 or 5.42). Any proven errors should 
be correcfed or, if fhey cannof be correcfed, fhe observafions should be sef as missing or 
removed, and any dubious observafions should be flagged. Finally, you should consider 
whefher an observafion mighf correspond fo a differenf populafion (e.g. a differenf species 
or subspecies) from fhe one of inferesf, in which case a differenf resulf mighf be expecfed 
for fhaf observafion. 

The nexf sfep is fo decide whaf fo do wifh fhe observafions idenfified as anomalous buf 
where no error can be proved: you musf decide whefher fo refain or remove fhese observa- 
fions. If is helpful fo consider af fhis poinf whefher fhe model is compafible wifh fhe dafa. 
Are fhere any imporfanf explanatory variables fhaf have nof been included in fhe model? 
Inclusion of fhese variables mighf improve fhe fif of fhe model and reduce fhe number of 
pofenfial oufliers. For example, if species are showing differenf reacfions fo a freafmenf, 
fhen explicifly allowing for fhis differenfial response in fhe model mighf accommodafe 
all of fhe observafions. If fhe deviafions do nof follow a Normal disfribufion wifh equal 
variance, fhen dafa fransformafion mighf be advanfageous (see Chapter 6) or anofher 
probabilify disfribufion can be used (see Chapter 18). In Figure 5.13, fhe residual plofs are 
enfirely accepfable aparf from fhe single ouflying poinf, so neifher of fhese opfions would 
be jusfified. 

Oufliers should nof be discarded indiscriminafely or wifhouf careful considerafion, nof 
leasf because observafions are often expensive fo obfain, buf also because fhis will affecf 
fhe analysis. Somefimes, parficularly wifh large dafa sefs (such as Example 5.2), fhe eftecf 
of removing an ouflier is negligible, buf usually esfimafes of fhe model parameters will 
change and fhe esfimafed residual variance will decrease, somefimes subsfanfially fhus 
increasing fhe chance of rejecfing Hg. This is nof necessarily desirable, as fhe residual vari- 
ance will be underesfimafed when we eliminafe oufliers fhaf are genuine observafions, 
resulfing in a larger Type I error (more false-posifive resulfs) fhan expecfed. The decision 
on which oufliers fo refain and which fo eliminafe musf always be reported. If fhe resulfs 
change markedly when oufliers are excluded, fhen if is good pracfice fo reporf fhese dif- 
ferences. Remember fhaf if may be considered fraudulenf fo remove 'inconvenienf' dafa 
poinfs from an analysis wifhouf good jusfificafion. In addifion, always bear in mind fhaf 
if fhe ouflier is a genuine observafion, fhen if mighf be fhe mosf imporfanf poinf in fhe 
sfudy, because if indicafes unexpecfed behaviour in fhe sysfem. A sfory abouf fhe hole in 
fhe ozone layer is offen quofed in fhis confexf. The fradifional version of fhe fate relafes 
fhaf NASA scienfisfs should have been fhe firsf fo discover fhe ozone hole over Anfarcfica, 
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but their satellite instruments used automatic outlier detection and deletion methods 
that removed anomalous readings, and so the seasonal ozone depletion went unnoticed 
until reported by Farman et al. (1985). In fact, Pukelsheim (1990) reports that the satellite 
observations were flagged, checked againsf dafa from a ground sfafion, and only fhen dis- 
missed because fhey confradicfed fhe ground sfafion readings. If lafer franspired fhaf fhe 
ground sfafion dafa were misleading due fo faulfy insfrumenf sellings! We believe fhaf fhe 
full version of fhis sfory carries even more warnings fhan fhe fradifional version: cerfainly 
you should nof use aufomafic mefhods fo delefe oufliers wifhouf examining fhem firsf, 
buf when you cross-check againsf anofher mefhod, you also need fo be sure fhaf fhe ofher 
mefhod is reliable. 

EXAMPLE 5.1E: CALCIUM POT TRIAL* 

In the calcium pot trial, treatment group C has four observations in the range 70-74 
(Table 4.1) and one much smaller observation with value 49 (pot 4). This large discrep- 
ancy suggests that this observation should be investigated as a potential outlier. This 
observation has a standardized residual of -2.33. However, although it is the largest 
residual (in absolute value), it does not appear to be inconsistent with the overall dis- 
tribution of the residuals shown in Figure 5.9. It is therefore sensible to examine this 
observation for potential sources of error, as suggested above, but if none is found, there 
is no justification for removing it from the analysis. 



An alternative analytical approach that allows the retention of pofenfial oufliers is fhe 
use of 'robusf' sfafisfical mefhods. These are designed fo be less sensifive fo fhe presence 
of oufliers, and furfher defails can be found in Barneff and Lewis (1994). 



EXERCISES 

5.1 Obfain fhe simple and sfandardized residuals from fhe ANOVA for fhe dafa 
from (a) Exercise 4.2 and (b) Exercise 4.3. Use a scaffer plof fo compare fhe sim- 
ple and sfandardized residuals in each case. Can you explain fhe pafferns fhaf 
you see? Are fhere any pofenfial oufliers? 

5.2 Eor fhe dafa sefs in each of Exercises (a) 4.2, (b) 4.3, (c) 4.4 and (d) 4.5, produce 
a sef of residual plofs based on sfandardized residuals, including a histogram 
of residuals, a fiffed values plof, an absolute residuals plof and a Normal plof. 
Give a crifical assessmenf of whefher fhe ANOVA assumpfions are reasonable 
in each case. Is fhere any evidence of oufliers? 

5.3* An experimenf compared fhe growfh of tomato seedlings in eighf commercial 
composfs. Space was available in a glasshouse fo place 32 small pofs in a single 
line along fhe edge of one bench. The area was assumed fo be homogeneous 
and so a CRD was used, wifh four pofs of each fype of composf. One seed- 
ling was fransplanfed info each pof and fhe heighfs of fhe young planfs (cm) 
were measured after 2 weeks. Pofs were numbered 1-32 along fhe bench and 
planfs were measured in order of pof number. The pof numbers (Pof), composfs 
used (factor Compost) and resulfing planf heighfs (variate Height) are in file 
coMPOST.DAT. Analyse fhese dafa and inspecf sfandardized residual and index 
plofs. Are fhe assumpfions of your model safisfied? 

5.4 Compare fhe unbiased sample variances for each freafmenf group from 
Exercise 4.3 using Barfleff's fesf. Is fhere any evidence of variance heferogeneify? 
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5.5* An experiment was devised to evaluate the effect of four watering regimes 
on root growth using a CRD. Each regime was applied to 12 individual plants 
growing in pots. Measurements of total root length (cm) were made at the end 
of the experiment. The unit numbers (Pot), watering regimes (factor Regime) 
and total root lengths (variate Length) are in file watering.dat. Analyse these 
data and inspect plots of standardized residuals. Are there any potential outli- 
ers? On inspection of the original data sheets, it was discovered that one obser- 
vation had been mistyped as 47 rather than 74. Correct this data value and 
rerun the analysis. Comment on whether the model assumptions are reason- 
able for fhese data. 



6 

Transformations of the Response 



In Chapter 5, we discussed graphical diagnostic tools for inspecting residuals and check- 
ing the assumptions underlying the linear model (see Section 4.1). In particular, we con- 
sidered how to evaluate the assumptions that the deviations have a common variance 
(homogeneity of variances, Secfion 5.2.1) and come from a Normal disfribufion (Secfion 
5.2.3). We indicafed fhaf fransformafions of fhe response variable mighf be used fo remove 
skewness in fhe disfribufion of fhe residuals (i.e. evidence of a non-Normal disfribufion) 
or fo correcf sysfemafic changes in fhe variances of fhe residuals (i.e. evidence of variance 
heferogeneify, as illusfrafed in Figure 5.3). 

In fhis chapfer, we show how differenf fransformafions can be used fo deal wifh par- 
ficular violafions of fhe model assumpfions. We sfarf by discussing fhe general rafionale 
for dafa fransformafions (Secfion 6.1) and fhen concenfrafe on fhree parficularly useful 
fransformafions: fhe log and square roof fransformafions for posifive response variables 
(i.e. wifh values > 0), and fhe logif fransformafion for proporfions (befween 0 and 1) or 
percenfages (befween 0 and 100, Secfion 6.2). If is usually desirable fo relafe resulfs fo fhe 
original scale of measuremenf, and so we also discuss fhe procedure of back fransforma- 
fion (Secfion 6.3). We fhen fake a closer look af fhe inferprefafion of fhe log fransformafion 
in ferms of a mulfiplicafive model (Secfion 6.4). Finally, we review some ofher mefhods for 
analysing non-Normal responses, which are useful when fransformafion is eifher unsuc- 
cessful or inappropriafe (Secfion 6.5). 



6.1 Why Do We Need to Transform the Response? 

The validity of the conclusions from a statistical analysis depends on the validity of the 
assumptions underlying that analysis. Applying a transformation to the response vari- 
able is an option that may allow the assumptions to be satisfied sufficiently so that we can 
continue to use the simple linear model and reach valid conclusions. A transformation, or 
data transformation, is the process of using a mathematical function to map the response 
variable from the original scale of measurement onto another (the transformed) scale. 

In the context of simple models involving factors, the most common reason for transform- 
ing responses is to stabilize the variance of the residuals so that a pooled estimate (across all 
treatments) can sensibly be used, i.e. to make the assumption of homogeneity of variance, 
or homoscedasticity, valid. A second important use of transformation is to make the distri- 
bution of the deviations closer to a Normal distribution (recall that conclusions of ANOVA 
based on the t- and F-statistics rely on the deviations having an approximate Normal distri- 
bution). Finally, the use of a transformation may provide a scale on which the additive form 
of the linear model is more realistic. If we are fortunate, then a transformation may achieve 
several aims simultaneously. However, on some occasions, whilst one aim is achieved by 
the use of transformation, other aspects of the analysis may be made worse. 
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The assumption that the deviations follow a Normal distribution implies that the 
response should be measured on a continuous, or close-to-continuous, scale without any 
abrupt truncation. There are several common types of response fhaf do nof apparenfly 
comply wifh fhis descripfion. For example, counfs musf be posifive (i.e. y > 0) and can fake 
only infeger values; fhey are clearly nof confinuous. If fhe counfs are small fhen fhe disfri- 
bufion may also be abrupfly fruncafed af zero. These factors are especially imporfanf for 
small counfs. Proporfions (0 < p < 1) fhaf are calculafed wifh respecf fo small samples (e.g. 
fhe proporfion of dead aphids in a sample of 10) can fake only cerfain values (e.g. 0, 0.1, 0.2 
... 1) and so again are nof confinuous, and may also be abrupfly fruncafed if many pro- 
porfions are close fo zero or one. These are obvious cases where fhe Normal assumption 
is invalid. In addition, for fhese f 5 ^es of response fhe variance is almosf always relafed fo 
fhe expecfed response for an observafion (usually formed a variance-mean relafionship). 
For counfs and proporfions fhere are ofher analyfical approaches based on more appropri- 
afe probabilify disfribufions (i.e. Poisson and Binomial, respectively). We briefly oufline 
fhese approaches, and when fhey are likely fo be required, in Secfion 6.5 buf we leave a 
more defailed discussion unfil Chapfer 18. Ofher f)q)es of response may be confinuous buf 
also show a variance-mean relafionship (and possibly fruncafion). For example, quantifies 
such as heighf or weighf are usually (effectively) confinuous buf musf be posifive, and 
larger expecfed values are often associafed wifh greater variation. Some proporfions may 
be effecfively confinuous, for example, percenfage area as assessed by eye or a computer, 
buf may show greater variance around 0.5 fhan af fhe limifs of fhe scale (0 or 1). In all of 
fhese cases, fransformafion often provides a good-enough approximafion fo fhe assump- 
tions of fhe linear model fo make fhe analysis valid and reliable. 

After fransformafion (based on considerafion of fhe residual plofs infroduced in Chapfer 
5), fhe analysis should be repeated for fhe franstormed response, and residuals from fhe 
new analysis should be inspecfed as usual. Sometimes, if is necessary fo fry several dif- 
ferenf fransformafions unfil safisfacfory residual plofs are obfained, and somefimes if will 
nof be possible fo find a suifable fransformafion. 



6.2 Some Useful Transformations 

In fhis secfion we describe fhe mosf common fransformafions, concenfrafing on fhe loga- 
rifhmic, square roof and logif fransformafions. 

6.2.1 Logarithms 

Possibly, fhe mosf common fransformafion is fhe logarithmic (log) transformation, which 
can take several related forms. The concepf of fhe log fransformafion can be mosf easily 
undersfood for fhe common logarifhm (log fo base 10), which maps fhe original response 
variable, y, fo a new variable, z, such fhaf y is equal fo 10 raised fo fhe power z, i.e. y = lOh 
and usually written as z = logio(y). Values of z = 0, 1 or 2 on fhe logm scale fhus correspond 
fo values of y = 10° = 1, 10^ = 10 or 10^ = 100 on fhe original scale. The nafural logarifhms 
(log fo base e) work in a similar way buf use Euler's number, e = 2.71828.... The nafural 
logarifhm maps y fo z such fhaf y is equal fo e raised fo fhe power z, i.e. y = e^, or equiv- 
alenfly y = exp(z), and can be written as z = log 5 ,(y). Logarifhms fo ofher bases are simi- 
larly defined, and any logarifhm can be used for a log fransformafion. There are several 
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different conventions of notation for specifying logarithms. Elsewhere you may see log(i/) 
used for either logio(y) or loge(]/), and ln(i/) for logg(i/). In this book we use the unsubscripted 
term 'log' to represent a logarithm in general discussion, but always specify the base as a 
subscript for specific examples. 

All log transformations are defined only for y > 0, i.e. for strictly positive values, but the 
transformed variable, z, can take any value. Values greater than one on the original scale 
(y > 1) are mapped onto positive values (z > 0), and values less than one on the original 
scale (0 < y < 1) are mapped onto negative values (z < 0), with y = 1 always mapped onto 
z = 0. Log transformations bring larger values closer together and spread smaller values 
relatively further apart. They therefore provide a useful way of dealing with responses 
that exhibit a right-skewed distribution, i.e. with only a short tail to the left of the distribu- 
tion peak and a long tail stretched out to the right (such as that illustrated in Figure 6.1a), 
making the distribution more symmetrical (as shown in Figure 6.1b) after log transforma- 
tion. Common examples include counts of insects on plants (where a few plants are very 
heavily infested), measures of size such as weights or lengths (where a few individuals 
are particularly heavy or large), counts of colony-forming units in pathogen cultures, and 
concentrations of plant nutrients and trace metals in soil samples. Clearly, if the responses 
are already reasonably symmetric or left-skewed then a log transformation is unlikely to 
be helpful. 

Log transformations are also useful for stabilizing the variance of responses for which 
the variance increases in proportion to the expected response, a feature often associated 
with integer counts in ecology, and usually detected via fitted value plots as in Figure 5.3. 
On applying a log transformation, the variation associated with larger values is decreased, 
hopefully achieving homogeneity of variances across the range of the variable. 

Finally, a log transformation can be applied to transform a multiplicative model onto an 
additive scale, as required for the form of the linear model. This valuable consequence of 
applying a log transformation is discussed in detail in Section 6.4. 

The choice of which base to use is arbitrary, though the type of response may sug- 
gest the choice of a particular base. For example, for numbers of colony-forming units, 
where values are often in the range from 10'* to 10® across a set of observations, and 
for which changes of an order of magnitude are important, the obvious choice is the 





logj,(abundance) 



FIGURE 6.1 

Distribution of a sample of insect counts: (a) original responses and (b) following log^ transformation (axis 
labels within parentheses indicate value on original scale). 
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logarithm to base 10 because, for example, a unit change on the logm scale reflects a 
10-fold increase in numbers. The defaulf used by many sfafisficians in fhe life sciences 
is fhe nafural logarifhm; possibly because Euler's number, e, seems fo be relafed fo many 
nafural phenomena. 

An imporfanf consequence of applying a log fransformafion is fhaf fhe influence of larger 
observafions on freafmenf or group means is less fhan on fhe original scale, whilsf fhe 
influence of small observafions is increased. This may be inappropriafe in some confexfs. 

A major consfrainf of applying a log fransformafion is fhaf if is defined only for posifive 
values, y > 0. However, for many fypes of posifive response, such as infeger counfs, zero 
is a valid observafion and applying a log fransformafion fhen resulfs in an undefined 
value. When fhere are only a few zero values, if is common pracfice fo add a small offset, 
c, to every response prior to applying the transformation function, so the transformation 
becomes 



z = log(y + c) . 

The inclusion of an offsef provides a degree of flexibilify in fhe fransformafion process, buf 
fhe choice made can affecf fhe oufcome, so fhe offsef should be chosen wifh some care. For 
infeger insecf counfs, if is usual fo add an offsef of c = 1, i.e. z = log(y + 1). In ofher cases, a 
simple rule of fhumb is fo use an offsef equal fo half fhe smallesf posifive value recorded. 
For example, if fhe smallesf posifive value for a response variable is 0.1 g, we mighf add an 
offsef of c = 0.05. In pracfice, if mighf be necessary fo fry several differenf offsefs fo find a 
value fhaf gives adequafe residual plofs. 



EXAMPLE 6.1A: BEETLE MATING 

An experiment was conducted to investigate the viability of interspecies mating in leaf 
beetles by examination of the results when females from two species of willow beetle 
(the brassy willow beetle, Phratora vitellinae, and the blue willow beetle, Phratora vulga- 
tissima) were mated with males from either their own species (intraspecies mating) or 
the other species (interspecies mating), i.e. there were four treatments (t = 4) in total (for 
further details, see Peacock et al., 2004). The experiment was carried out as a CRD (com- 
pletely randomized design) with 10 replicates of each treatment (n = 10). We analyse the 
number of eggs laid by each female; the data are presented in Table 6.1 and can be found 
in file beetles.dat, where factor Treatment has four levels (labelled as 1 = P. vit. x interspe- 
cies, 2 = P. vit. X intraspecies, 3 = P. vulg. x interspecies, 4 = P. vulg. x intraspecies) and 
the response is held in variate Eggs. 

The untransformed counts were analysed by one-way ANOVA (see Section 4.3) using 
model 

Response variable; Eggs 

Explanatory component: Treatment 

A composite set of residual plots based on standardized residuals from this analysis 
is shown in Figure 6.2. In the fitted values and absolute residual plots, the variance 
appears greater for larger fitted values, with a suggestion of skewness in the histogram 
and a slight curve in the Normal plot, both showing some possible outliers. 

In an attempt to remove the observed variance-mean relationship, the counts were 
transformed to the logjp scale as logEggs = log^glEggs), and the transformed response 
was analysed, using model 
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TABLE 6.1 



Number of Eggs Laid by Females of Two Willow Beetle Species 
(P. vitellinae and P. vulgatissima) Following Inter- or Intraspecies 
Mating (Example 6.1A and File beetles.dat) 





P. vitellinae 


P. vulgatissima 


Replicate 


Interspecies 

; = i 


Intraspecies 
; = 2 


Interspecies 
; = 3 


Intraspecies 
; = 4 


1 


57 


90 


82 


136 


2 


15 


80 


91 


117 


3 


40 


101 


66 


181 


4 


34 


59 


98 


41 


5 


42 


73 


82 


89 


6 


19 


51 


134 


106 


7 


43 


43 


51 


133 


8 


39 


57 


96 


98 


9 


36 


42 


52 


106 


10 


24 


66 


91 


79 


Vh 


34.9 


66.2 


84.3 


108.6 




32.6 


63.6 


81.1 


102.0 



Source: Data from Rothamsted Research (A. Karp). 

Note: yj. is the arithmetic sample mean and is the geometric sample mean 
for the /th treatment. 



Response variable: logEggs 

Explanatory component: Treatment 

This model can be written in mathematical form as 



logEggs ji; = Treatment j + ejf. , 

where logEggSjf. is the logiQ-transformed number of eggs for the fcth replicate (k=l ... 10) 
of the ;th treatment (numbered 1 ... 4 as for the factor levels given above) with deviation 
gyj., and Treatmentj is the population mean for the /th treatment on the logiQ scale. Plots 
of standardized residuals from this new model are shown in Figure 6.3. The spread of 
residuals is now more consistent across the range of fitted values, although still not per- 
fect. The histogram seems slightly skewed in the other direction, and the Normal plot 
still shows some curvature, although the extreme points are now more consistent with 
the overall pattern. The ANOVA table for this model is Table 6.2. 

The null hypothesis is that the treatment population means on the logm scale are 
all equal. On the basis of the ANOVA of fhe transformed response (Fj = 19.254, 
P < 0.001) we should reject this hypothesis and conclude that differences exist 
between the treatment means on the logjo scale. The sample means calculated from 
fhe logged counts for the four treatments, used to predict the corresponding popula- 
tion means, are listed in Table 6.3. The ResMS gives = 0.0238, from which we derive 
the common SEM of V(s^/10) = 0.0488, and the SED of V(2s^/10) = 0.0690 for comparing 
pairs of treatments, both with 36 df. Most eggs are laid by the intraspecies-mated 
P. vulgatissima females (treatment 4) and fewest by the interspecies-mated P. vitellinae 
females (treatment 1). In Section 6.3, we discuss how to relate these predictions back 
to the original scale. 
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FIGURE 6.2 

A composite set of residual plots based on standardized residuals for the number of eggs laid in the beetle mat- 
ing experiment (Example 6.1A). 



TABLE 6.2 

ANOVA Table for the LogiQ-Transformed Number of Eggs from the Beetle 
Mating Experiment with Eour Treatments (Eactor Treatment) (Example 6.1A) 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Treatment 


3 


1.3751 


0.4584 


19.254 


< 0.001 


Residual 


36 


0.8571 


0.0238 






Total 


39 


2.2322 









TABLE 6.3 



Predicted Treatment Means for Logjo-Transformed Numbers of 
Eggs from the Beetle Mating Experiment, with SEM = 0.0488, 
SED = 0.0690 on 36 df (Example 6.1A) 



Treatment 1 


Treatment 2 


Treatment 3 


Treatment 4 


(P. vit. X Inter) 


(P. vit. X Intra) 


(P. vulg. X Inter) 


(P. vulg. X Intra) 


1.513 


1.804 


1.909 


2.008 
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FIGURE 6.3 

A composite set of residual plots based on standardized residuals for the logiQ-transformed number of eggs 
from the beetle mating experiment (Example 6.1A). 



We return to the log transformation in Sections 6.3 and 6.4 and discuss alternatives to 
the log transformation for counts in Section 6.5. 



6.2.2 Square Roots 

The square root transformation, z = ^jy, is defined for all non-negative numbers, i.e. posi- 
tive numbers and zero (y > 0), and maps onto that same range (z > 0). Hence, it is a poten- 
tially useful fransformafion for any response thaf can fake only non-negafive values, buf if 
can be parficularly appropriafe for responses measured as areas, as fheir square roofs can 
be interpreted as being proportional to an average radius or diameter, it can also be used 
as an alternative to a log transformation for posifive responses. Like the log transforma- 
tion, the square root transformation tends to bring larger values closer together and spread 
smaller ones relatively further apart. However, for fhe square roof fransformafion this res- 
caling is not as strong and so it may be more successful in cases where fhe log fransforma- 
fion has over-correcfed skewness or variance heterogeneity. The effect of fhe square roof 
fransformafion in correcfing skewness is shown in Figure 6.4. Because fhis fransformafion 
is defined for zeros (y = 0), if can be used wifhouf an offsef when zero responses are pres- 
enf. However, some aufhors have suggested that an offset of c = 0.5 mighf still be useful in 
fhis case (see Sokal and Rohlf, 1995). 
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FIGURE 6.4 

Distribution of a sample of area measurements: (a) original responses and (b) following square root transforma- 
tion (axis labels within parentheses indicate value on original scale). 



6.2.3 Logits 

Data in the form of proportions usually (but not always) occur where each experimental 
unit contains a sample of m entities (usually the same number for each experimental unit) 
and the number that fall into a specific category, y, has been counted. For example, we might 
sample 25 weed plants from a field plot and count the number showing herbicide resis- 
tance. Such data are usually reported as proportions, p = y/m (for 0 < p < 1), or as percent- 
ages, P = 100 X yjm (for 0<P < 100) and can often be described by a Binomial distribution 
(see Section 2.2.1), for which the variance is directly related to the expected response. For 
this type of response, the logit transformation, defined as z = logfy/{m - y)) or equivalenfly 
z = logg(p/(l - p)) or z = loge(P/(100 - P)), is often applied to remove or reduce the variance- 
mean relationship and provide the homogeneity of variance assumed for a linear model. 

Occasionally, proportions (or percentages) do not have a direct interpretation as p = y/m. 
For example, computer measurement of the proportion of lesion area on plant leaves works 
on an effectively continuous scale. However, these measurements may display similar pat- 
terns of variance heterogeneity so that application of the logit transformation in the form 
z = logj.(p/(l - p)) or z = loge(P/(100 - P)) is still appropriate. 

Theoretically, the logit transformation maps from the range (0, 1) onto an unrestricted 
range, but in practice we need consider only the range (-4, 4) for most applications. The 
proportion 0.5 maps onto a logit of zero, with proportions less than 0.5 resulting in negative 
values and proportions greater than 0.5 resulting in positive values. The logit transforma- 
tion tends to bring values at the centre of the range (-0.5) closer together and to spread val- 
ues at the ends of the range (approaching 0 or 1) relatively further apart. Figure 6.5a shows 
the distribution of proportions obtained as the number of diseased plants out of samples 
of size 25 for a survey of a single variety with mean prevalence of 0.8. Figure 6.5b shows 
fhe logit-transformed proportions, which are all positive as the original proportions were 
greater than 0.5. The logit transformation has corrected the left-skewness of the untrans- 
formed responses by spreading out the larger proportions relative to those closer to 0.5. 
Proportions often show an increased variance for values with an expected response around 
0.5, relative to those nearer to the ends of the scale (in many cases, a property inherited from 
fhe Binomial distribution), and the logit transformation tends to counteract this property 
and, if we are fortunate, result in homogeneity of variances across the range of the variable. 
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FIGURE 6.5 

Distribution of a sample of proportions: (a) original responses and (b) following logit transformation (axis labels 
within parentheses indicate value on original scale). 



The logit transformation has the constraint that it is undefined af fhe limifs of fhe range, 
i.e. for p = 0 or p = 1 (or P = 0 or P = 100). To avoid fhe problem fhaf fhis creafes, we mighf 
compufe fhe proporfion from a sample of size m using an offsef c as p = (y + c)/{m + 2c). 
This moves fhe ends of fhe range symmefrically away from zero and one, as fhe minimum 
proporfion is fhen c/{m + 2c) (> 0) and fhe maximum is {m + c)/{m + 2c) (< 1), wifh fhe cenfre 
of fhe range unmoved, as {m/2 + c)/{m + 2c) = 0.5. The logif fransformafion can fhen be cal- 
culafed direcfly in ferms of fhe adjusfed proporfions or percenfages or as z = log^Ky + c)/ 
{m + c- y)]. Here, fhe offsef c is usually chosen fo be equal fo 0.5 or 1, buf ofher offsefs can 
be used. If fhe responses are recorded as proporfions or percenfages direcfly (so p or P is 
known buf y and m are nof), fhen fhe adjusfed fransformafion fakes fhe form z = logg[(p + c)/ 
(1 + c - p)], or z = logg[(P + c)/(100 + c - P)], where c is chosen fo be fhe minimum of fwo val- 
ues: fhe difference befween 0 and fhe smallesf observafion, and fhe difference befween 1 
(or 100 when using percenfages) and fhe largesf observafion. 

The quanfify p/(l - p) is somefimes called fhe odds, so fhaf fhe logif fransformafion cor- 
responds fo fhe logarifhm of fhe odds, or log-odds. Resulfs of an analysis wifh fhe logif 
fransformafion are fherefore somefimes inferprefed in ferms of a change in fhe odds, or 
odds rafios. Whilsf fhis is an inferprefafion fhaf is commonly used in medical sfafisfics 
and in fhe beffing indusfry, if is an inferprefafion fhaf is offen difficulf fo relafe fo fhe bio- 
logical background of an analysis, and so we generally avoid fhis form of inferprefafion. 

We discuss alfernafives fo fhe logif fransformafion in Secfions 6.2.4 and 6.5. 



6.2.4 Other Transformations 

Many other transformations have been suggested for data analysis, and these are often 
related to a physical interpretation of the measurement scale. For example, a cube root 
transformation might be considered for volumes, as the transformed response could be 
related to average size in one dimension. Or a reciprocal transformation might be consid- 
ered for growth rates measured as mm/day, as the transformed response could be inter- 
preted as the number of days required to grow 1 mm. In an ideal case, a transformation 
will give an interpretable physical representation of the response as well as enabling it to 
satisfy the assumptions of the analysis so that the conclusions are valid. 
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The family of power transformations, defined as 

z = ^ for ^ 0 and z = loggi/ for = 0 , 

encompasses many of fhe fransformafions commonly used for posifive responses. The 
paramefer X is known as fhe power paramefer. When ^ = 0, 0.5, 1 and -1, fhe resulfing 
fransformafions are equivalenf fo a nafural logarifhm, square roof, simple linear frans- 
formafion and reciprocal fransformafion, respecfively. The Box-Cox transformation 
provides a method for deciding fhe besf power fransformafion fo use fo obfain an approxi- 
mafe Normal disfribufion, buf a full descripfion of fhis approach is oufside fhe scope of 
fhis book. For more defails, see Sokal and Rohlf (1995). 

For proporfions or percenfages, fhe mosf common alfernafive f o fhe lo git is the arcsine, or 
angular transformation, defined as z = arcsln^ or z = arcsin^P/100. Ofher possibilifies 
include fhe probit and complementary log-log transformations. Further details on these 
alternative transformations can be found in Sokal and Rohlf (1995). 



6.3 Interpreting the Results after Transformation 

Following the analysis of a transformed response (e.g. Example 6.1A), interpretation of the 
results is often aided if they can be represented on the scale of the original measurements. 
Unfortunately, standard errors and other measures of variability (i.e. SEMs, SEDs and LSDs) 
cannot be back-transformed directly because most transformations impose a non-linear 
re-scaling so that the size of the back-transformed error should differ according to the pre- 
dicted value(s) with which it is associated. This means that the comparison of the difference 
between two predictions based on an appropriate SED must be made on the transformed 
scale. Flowever, the limits of confidence intervals for predictions of treatment population 
means or differences (see Section 4.4) derived on the transformed scale can be back-trans- 
formed, along with the predicted value, for presentation and interpretation on the original 
scale. Note that whilst confidence intervals on the transformed scale are symmetric about 
the estimated value, they are usually asymmetric on the back-transformed scale. 

Eormulae for the transformations and back-transformations corresponding to the log, 
logit and square root functions are presented in Table 6.4. Because of the importance of the 
log transformation, we discuss the interpretation of back-transformed predictions from 
the log scale in detail in Section 6.4. 

Note that, even when a transformation appears to have been successful, when an offset 
has been included then the back-transformation may lead to estimates outside the valid 
range of the original variable (e.g. negative values for counts, values exceeding one for 
proportions, etc.), which is clearly undesirable. In this case, other methods should be used 
(see Section 6.5). 

EXAMPLE 6.1B: BEETLE MATING 

Table 6.5 shows back-transformed values of the treatment sample means and 95% 
confidence limits calculated from the formulae in Section 4.4. For example, the 95% 
confidence interval for P. vitelUnae x interspecies mating (treatment 1) calculated on 
the logio scale requires the predicted population mean, its SEM, and the 97.5th per- 
centile of the t-distribution with 36 df, given by 
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TABLE 6.4 



Common Transformations and Their Inverses (Back-Transformations) 



Transformation 


Description 


Back-Transformation 


z = iogio(y) 


Common logarithm 


o 

II 


z = ioge(y) 


Natural logarithm 


II 


z = logi„(y + c) 


Common logarithm with offset c 


CJ 

1 

o 

II 


Z = logeCy + c) 


Natural logarithm with offset c 


1 

II 


z = loge(y/(m - y)) 


Logit 


y = m eV(l + e^) 


z = logd(y + c)/(m - y + c)] 


Logit with offset c 


y = [(m + c) e^ - c]/(l + e^) 


z = logd(P + c)/(100 - P + c)] 


Logit of percentages with offset c 


P = 1(100 + c)e^-c]/(l + (f) 


N 

II 


Square root 


y = z^ 


z = V(y + c) 


Square root with offset c 


1 

II 



TABLE 6.5 

Predicted Means (Middle Value) and Lower and Upper 95% 
Confidence Limits (First and Third Values, Respectively) on 
the Back-Transformed Scale for the Beetle Mating 
Experiment (Example 6.1B) 





Mating Type 




Interspecies 


Intraspecies 


Species of female P. vit. 


25.9, 32.6, 40.9 


50.7, 63.6, 79.9 


P. vulg. 


64.6, 81.1, 101.8 


81.2, 102.0, 128.1 



tfeai^h = 1.513; SEM = 0.0488; = 2.028 . 



The Cl is then computed as 1.513 + (2.028 x 0.0488) = (1.414, 1.612). These limits are then 
back-transformed to give (lO^'*”, 10' ®^) = (25.9, 40.9). The predicted mean is back-trans- 
formed similarly, i.e. lO'-^i^ = 32.6. Note that the back-transformed mean is smaller than 
the midpoint of the confidence interval (which is 33.4), and is slightly smaller than the 
treatment mean calculated from the original data (yi. = 34.9, Table 6.1); this is discussed 
further in Section 6.4. 

It is possible to obtain an approximation of the standard errors (SEMs or SEDs) for the 
means on the back-transformed scale. These can be obtained by the delta method, which 
is outside the scope of this book, but is commonly used for non-linear models (for more 
details, see Casella and Berger, 2002). However, there is no warranty that this method will 
provide adequate estimates, and we do not recommend its use in the current context. 



6.4 Interpretation for Log-Transformed Responses 

In the case where a log transformation is appropriate, so that all assumptions of the 
analysis are met by the log-transformed response, we can interpret the back-transformed 
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predictions in terms of a multiplicative model. This connection exists because the sum of 
logarifhms of fwo (or more) values is equal fo fhe logarifhm of fheir producf, expressed as 

log(fl) + log(fc) = log(fl X b) . 

Now consider fhe CRD model (Equafion 4.1) as applied fo a log^-fransformed response, 
Zj^■ = logg(i/^t), which fakes fhe form 



Z;/c - h; + ■ 

We back-fransform fhis model by applying fhe exponenfial funcfion fo bofh sides of fhe 
equafion (see Table 6.4), fo gef 



exp(z^vc) = exp(p^ + e,j) . (6.1) 

From fhe original fransformafion, we know fhaf exp(zyj) = i/^j. We can simplify fhe expres- 
sion on fhe righf-hand side furfher by using fhe mafhemafical properfy fhaf 

exp(fl + b) = exp(fl) x exp{b ) , 

i.e. fhe exponenfial of a sum is equal fo fhe producf of fhe exponenfials of fhe componenfs 
of fhe sum. We can fherefore rewrife Equafion 6.1 as 

i/;7c = exp(p,) xexp(e^-;t) • 

This is now a mulfiplicafive model on fhe nafural scale: fhe componenfs of fhe model are 
mulfiplied fogefher rafher fhan added fogefher, wifh fhe log fransformafion providing fhis 
change in fhe form of relafionship. 

The predicfed values from fhis model are fhe freafmenf means formed on fhe log^ scale. 
The predicfed populafion mean for fhe jth freafmenf is calculafed as 

1 

h; = Zj. = - > loge(y;7c) = loge(y;.) , 
n 

)c=l 



which is fhe nafural logarifhm of fhe geomefric mean for fhe ;fh freafmenf (here denofed 
y^., see Mafhemafical Aside 6.1). The back-fransform of fhis predicfion is fherefore simply 
fhe geomefric mean wifh respecf fo fhe original responses for fhe jth freafmenf group. One 
characferisfic of fhe geomefric mean is fhaf if is always smaller fhan or equal fo fhe corre- 
sponding arifhmefic mean. The difference depends on fhe skewness of fhe sample: fhe more 
righf-skewed fhe sample, fhe larger fhe discrepancy in fhese fwo measures of locafion. If 
follows fhaf fhe back-fransformed predicfion for any freafmenf will always be smaller fhan 
fhe arifhmefic sample mean for fhaf freafmenf in fhe original dafa (as in Example 6.1B). 

Differences befween freafmenf predicfions on fhe log^ scale can be inferprefed in ferms 
of rafios on fhe back-fransformed scale, as a difference on fhe log^ scale befween fhe ;fh 
and kth freafmenfs, wriffen as py - p^t, is back-fransformed as 



exp(p, - Pt) = exp(p;)/exp(pt) = y^./yn , 
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using the general rule that exp(a -h) = exp (a)/ exp (b). The back-transformation of a pre- 
dicfed difference befween fwo freafmenfs on fhe log^ scale is fherefore equivalenf fo fhe 
rafio of fhe corresponding geomefric means on fhe unfransformed scale. We can also 
inferpref fhe confidence inferval (Cl) for a difference in populafion means on fhe back- 
fransformed scale. The limifs of a 100(1 - Cl fake fhe form 

((M-; M-fc) hSDy (py , 



wifh back-fransform 

(exp(pj - p/t)/exp(LSD^,t), exp(py-p;t) x exp(LSD^,fc)) . 

The quantify exp(LSDyji) can fherefore be inferprefed as a mulfiplicafive facfor giving a 
range of plausible values on fhe original scale in ferms of a percenfage decrease or increase. 
This is illusfrafed in Example 6.1C. 

If fhe underlying physical process is mulfiplicafive, fhen fhe log fransformafion maps 
fhis onfo an addifive form, congruenf wifh fhe form of fhe linear model. If is fairly 
common fo find fhaf an inferacfion fhaf is sfafisfically significanf on fhe original scale 
becomes non-significanf on fhe log scale. The logif fransformafion may play a similar 
role in fransforming proporfions or percenfages onfo a scale where an addifive model 
is more appropriafe. 

Similar properfies hold for log fransformafion fo any base by subsfifufing in fhe appro- 
priafe back-fransformafion funcfion, for example, for fhe logjo fransformafion, subsfifufe 
fhe funcfion 10^ in place of exp(z). 

EXAMPLE 6.1C: BEETLE MATING 

The results from the transformed analysis in Example 6.1A indicate that there are differ- 
ences between treatments - we now wish to interpret these differences with respect to 
a multiplicative model. On the logm scale, the difference between the predicted popula- 
tion means (Table 6.3) for P. vtdg. x intraspecies mating (treatment 4) and P. vit. x inter- 
species mating (treatment 1) is 

Treatmenti - Treatmmh = 2.008 - 1.513 = 0.496 . 

As expected, the back-transform of this difference (10“-^’'’ = 3.13), is equivalent to the 
ratio of the geometric means on the original scale for each treatment (Table 6.1), with 

|h = m = 3 . 13 . 
yi. 32.6 



We therefore estimate that, on average, when both were mated with P. vulgatissima 
males, P. vulgatissima females laid 3.13 times as many eggs as P. vitellinae females. 

The SED on the logjQ scale is 0.0690 with 36 df, and = 2.028 (Example 6.1B), so the 
5% LSD is equal to SED x = 0.0690 x 2.028 = 0.1399. We can calculate a 95% Cl for 
this difference on the transformed scale as 



{freatmenti - Treatment^ + LSD = 0.496 + 0.1399 = (0.356,0.636) , 
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with back-transform = (2.27, 4.32). The back-transform of the LSD is 

]^Q0.i399 _ 2 . 38 ^ and it is straightforward to verify that the Cl limits are equal to the back- 
transformed difference (3.13) divided or multiplied by 1.38 (an increase to 138% or 
decrease to 100/1.38 = 72% of the predicted value). Our 95% confidence interval there- 
fore states that P. vidgatissima females lay between 2.27 and 4.32 times as many eggs as 
P. vitellinae females when both are mated with P. vidgatissima males. 

We shall return to this example in Chapter 8 where we look at the structure of the four 
treatments in more detail. 



Mathematical Aside 6.1 



The geometric mean for the ;th treatment group with replication n is defined as 



Vi- = iVo X y, -2 X ... X 




where the symbol '11' denotes the product of the values over the specified indices. If fol- 
lows fhat the arithmetic mean of log-fransformed values is equal to the logarithm of the 
geometric mean of fhe unfransformed values, or 



1 vh 1 

- Xlog(y#) =-log(i/;i X y,2 X ... X y,„) 
n n 

k=\ 



n 



log 



n 

V k=l 



\ 

/ 




iog(yy.) ■ ■ 



6.5 Other Approaches 

If a fransformation is successful, so thaf all the assumptions of the analysis are satisfied, 
wifh good residual plots produced, then the results are likely to be reliable. However, in 
many circumstances this is not achievable. Some common situations where transforma- 
tion is unlikely to be unsuccessful include 

• Counts with many small or zero values 

• Proportions calculated with respect to small samples (< 10) 

• Proportions (or percentages) with many values at or close to the limits 0 or 1 (0 or 100) 

• Proportions calculated with respect to samples of differenf sizes 

In fhese cases, there is a better alternative to the use of fransformations, which is fhe use 
of generalized linear models (GLMs) based on fhe assumpfion of a distribufion ofher fhan 
Normal for the response. For example, insect counts might be assumed to follow a Poisson 
distribufion which is defined for zero and positive integers, and is characterized by a vari- 
ance equal to the expected response (so that as the mean increases so does the variance). 
When proportion responses have been calculated with respect to a sample of enfities, the 
original counts might be assumed to follow a Binomial disfribufion. GLMs provide a flex- 
ible framework of models fhat use one of several statistical distributions for fhe response 
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(including the Normal) and can take all features of fhe dafa info accounf. They also have 
fhe advanfage fhaf fransformafion of fhe model fo an addifive scale can be made indepen- 
denfly of disfribufional assumpfions (McCullagh and Nelder, 1989). 

In ofher cases, fhere mighf be no simple alfernafive approach. In fhese cases, parficularly 
when fhe sample size is small, so fhaf if is difficulf fo deduce whaf disfribufion fhe response 
mighf follow, or when fhe responses are ranks or scores, if mighf be preferable nof fo make 
any disfribufional assumpfions af all, in which case non-paramefric mefhods are appropri- 
afe. The permufafion fesfs described in Secfion 5.2.4 can also be useful in fhis confexf. 

We posfpone a discussion of GLMs unfil Chapfer 18. For now, we simply sfafe fhaf all is 
nof losf if your responses do nof safisfy, and cannof be fransformed fo safisfy, fhe assump- 
fions underlying fhe linear model and ANOVA. 

EXERCISES 

6.1 A sfudy was conducfed fo esfimafe fhe abundance of rye-grass on fhree dif- 
ferenf sifes. Af each sife, quadrafs of size 0.1 m^ were randomly placed (12 af 
sifes 1 and 2, and 24 af sife 3) and fhe number of rye-grass planfs in each quad- 
raf was recorded. The unif number (DQuadrat), sife (facfor Site) and rye-grass 
counf (variafe Count) for each quadraf are in file ryegrass.dat. The objecfive 
of fhe sfudy was fo defermine whefher fhe abundance of rye-grass differed 
among fhe fhree sifes. Plof fhe observed dafa and analyse fhem using a one- 
way ANOVA. Are fhere any indicafions fhaf fhe dafa require fransformafion? 
Analyse fhe dafa on an appropriafe alfernafive scale. Is fhere any evidence of 
sife differences?* 

6.2* A pilof sfudy invesfigafed fhe paffern of an insecf pesf (beefle) enfering a suscep- 
fible field crop. If was suspecfed fhaf fhe beefles enfered fhe crop from fhe edge 
of fhe field and fhen progressed towards fhe cenfre. One field was surveyed 
periodically and, once fhe beefles were presenf in reasonable numbers, a fran- 
secf was faken from fhe edge towards fhe cenfre of fhe field wifh samples faken 
af 2 m intervals. Af each disfance, beefle counfs were made from four randomly 
selected planfs, giving replicate measuremenfs af each disfance. The file tran- 
SECT.DAT confains fhe unif numbers (DPIant), disfances (facfor fDist) and beefle 
counfs (variafe Count). Analyse fhese dafa, using a fransformafion if necessary, 
fo invesfigafe whefher fhere is any evidence fhaf beefle numbers vary befween 
sampling disfances. Whaf ofher hypofheses mighf you like fo fesf? 

6.3 A field experimenf was carried ouf fo invesfigafe fhe effecfs of amounf and fim- 
ing of sulphur applicafion on fhe level of scab disease in pofafoes (Cochran and 
Cox, 1957, Table 4.1). Three doses of sulphur were used (300, 600 and 1200 lb/ 
acre) and fhese were applied in eifher spring or aufumn. Plofs wifh no sul- 
phur applicafion were included as confrols, giving seven freafmenfs in fofal. 
The confrol freafmenf was replicafed eighf fimes and fhe six sulphur freafmenfs 
each four fimes in a CRD wifh 32 plofs in a four-row x eighf-column layouf. 
The average percenfage surface area wifh scab for 100 pofafoes per plof is fhe 
response fo be analysed. The unif numbers (Plot), freafmenfs applied (facfor 
Treatment) and responses (variate Scab) can be found in file scab.dat. Analyse 
these data on an appropriate scale using one-way ANOVA to compare the seven 



Data from S. Moss, Rothamsted Research. 
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treatments. Is there any evidence that the application of sulphur affects the inci- 
dence of scab? (We re-visif fhese dafa in Exercises 8.5 and 11.4.) 

6.4 Re-analyse fhe dafa of Example 6.1 (Table 6.1 and file beetles.dat) using one-way 
ANOVA wifh a square roof rafher fhan logarifhmic fransformafion. Compare 
fhe fwo analyses. Which fransformafion is mosf appropriafe? 

6.5 An experimenf invesfigafed defecfion of IgG anfibodies ingesfed by parasifoids 
wifh enzyme-linked immunosorbenf assay (ELISA). Af fhe sfarf of fhe experi- 
menf, parasifoids were fed eifher honey spiked wifh anfibodies or normal honey 
(negafive confrol, labelled Confrol). Those fed spiked honey were eifher fesfed 
immediafely afferwards (posifive confrol fo check fhaf fhe anfibodies had been 
ingesfed, labelled DayO) or affer one, fwo or fhree days (labelled Dayl, Day 2, 
Day 3, respecfively), having been fed on normal honey in fhe inferim. The five 
freafmenfs were each allocafed af random fo 10 parasifoids as a CRD and fhe 
insecf samples were placed info 50 wells of a sfandard 96-well microplafe for 
fesfing. The resulfing opfical densify readings (variafe Optical Density) are in 
file PARASiTOiDS.DAT With the unit number of each parasifoid (DParasitoid) and 
fhe freafmenf fo which if was allocafed (factor Treatment). 

The main aim of fhe experimenf was fo assess for how long affer ingesfion 
fhe antibodies could be detected, i.e. comparisons between the negative control 
treatment and samples after one, two or three days. Analyse these data appro- 
priately using one-way ANOVA and discuss whether this aim can be fully real- 
ized. Whaf conclusions can you draw?* * 

6.6 The concenfrafions of several frace mefals in a region of fhe Swiss Jura were 
quanfified by a survey of soil samples af 366 sites (Affeia ef al., 1994). The mefals 
measured (in mg/kg) included cadmium (Cd), chromium (Cr), copper (Cu) and 
zinc (Zn). The full dafa sef was published in Goovaerfs (1997). Here, we consider 
a subsef of 207 sample poinfs on a square grid wifh approximately 250 m spac- 
ing. The land use af each sample poinf was classified info one of fhree catego- 
ries (1 = foresf, 2 = pasfure, 3 = meadow). The unif number (DSample), spafial 
locafion {x- and y-coordinafes in variafes X and Y, respecfively) and land-use 
cafegory (facfor LandUse) for each sample can be found in file metals.dat 
along wifh fhe concenfrafions of each mefal af each locafion (variafes Cd, Cr, 
Cu and Zn). Analyse fhe concenfrafion of each mefal on an appropriafe scale fo 
determine if fhere are differences among fhe land fypes. Are fhere any mefals 
for which you cannof come fo a reasonable conclusion? Plof fhe co-ordinates of 
fhe spafial locafions, and consider how you mighf look for spafial dependence 
in fhe residuals. Gan you implemenf your idea? Is fhere any evidence of spafial 
dependence?^ 



Data from M. Torrance, Rothamsted Research. 

* Data from R. Webster, Rothamsted Research & previously Ecole Polytechnique Federate de Lausanne. 
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The design principle of blocking, used to control for known or expected heterogene- 
ity (variability) among experimental units, was introduced in Section 3.1.3. The basic 
approach is to group, or block, together sets of experimental units expected to have similar 
responses in the absence of different treatments, and to separate those units expected to 
have different responses. Blocking is frequently used in designed experiments to account 
for heterogeneity due to the location or timing of measurements. For example, in a glass- 
house experiment (in the northern hemisphere), we might expect plots closer to the south 
wall of a compartment to be warmer than plots closer to the north wall, and so, we group 
our experimental units based on their distance from the south wall. Similarly, if samples 
have to be processed, but only half can be done in the morning and the remainder done 
in the afternoon, then the morning and afternoon sessions might be used as two blocks to 
guard against systematic differences caused by any change in the background conditions, 
or a change of the experimenter. Blocking is also widely used in observational studies. 
For example, if an ecological study makes observations of the species present on pairs of 
fields (e.g. one growing wheat, another growing oilseed rape) on several farms, then the 
farms can be included as a blocking structure to account for the many expected differ- 
ences (caused by a combination of location and management practices) between farms. 
Full specification of an experiment therefore requires knowledge of both the blocking and 
treatments present. In developing ideas for designs with blocking, we consider a single set 
of treatments, which may mean either imposed treatments in a designed experiment or 
groups in an observational study, as in the previous chapters. 

The simplest layout that includes some form of blocking is the RCBD which was intro- 
duced in Section 3.3.2. In this design, the size of each block is equal to the number of 
treatments, with each treatment occurring exactly once in each block, and with treatments 
allocated at random to the units within each block (i.e. an independent randomization 
for each block). This chapter begins by describing the analysis of data from a RCBD. The 
first step in the analysis is to write down a model for the data (Section 7.1) and to obtain 
estimates of the model parameters (Section 7.2). A simple ANOVA is then used to obtain 
an estimate of the background variation and to test whether there are real differences 
between the treatments or groups (Section 7.3). These results can be combined to examine 
the treatment means together with appropriate estimates of error (Section 7.4). While in the 
analysis of the CRD (Chapter 4), there was only one factor to consider, in the analysis of the 
RCBD, there is a block factor in addition to the treatment factor. It is important to realize 
that within the model, the status of these two factors is different: the block factor is con- 
cerned with the structure (heterogeneity) of the units, and corresponds to the structural 
component of the model, while the treatment structure defines the different treatments 
(or treatment combinations) applied to the units, and corresponds to the explanatory com- 
ponent of the model (Section 1.3). Flence, the structural component allows us to assess 
different sources of natural variation among the experimental units and the explanatory 
component provides information about the differences in response caused by the different 
treatments, in particular allowing us to estimate the sizes of these differences. Recognition 
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of the different roles played by these two components leads to the idea of a multi-stratum 
ANOVA which makes explicit the separation between them (Section 7.5). The translation 
of the simple ANOVA into a multi-stratum ANOVA can be easily demonstrated for the 
RCBD, with the idea then extended for more complex designs. The benefit of the multi- 
stratum ANOVA is that, given the correct specification of the structural and explanatory 
components for an experiment, the correct analysis follows. For example, this enables the 
automatic recognition of sub-sampling or pseudo-replication. 



7.1 Defining the Model 

In the model for a RCBD, there are now two factors to consider: one defining the block to 
which each experimental unit is allocated, and the other defining the treatment applied to 
each unit. We use a general notation that can be applied to any data set. Suppose there are 
t treatments, with each treatment equally replicated n times, such that there are n blocks, 
each consisting of t units with each treatment occurring once in each block. The simplest 
notation for the RCBD labels each observation by its block (index i) and treatment alloca- 
tions (index ;). Then j/,y represents the observation on the ;th treatment in the zth block and 
the full set of observations can be denoted as j/,y, i = 1 ... n,j = l ... t (see Section 2.1 for an 
overview of notation). The total number of observations is N = nxt. Extending the nota- 
tion presented for the CRD in Section 4.5, we can write the linear model for the data from 
a RCBD as 



yij = \i + b, + Xj + Cij , (7.1) 

where p represents the overall population mean, b, is the effect of the ith block (as a dif- 
ference from the overall mean) and Xj is the effect of the jth treatment (again as a dif- 
ference from the overall mean), with deviations e,y reflecting individual variation about 
the population values. In this model, the population mean for the jth treatment can be 
derived as p -i- x,. The assumptions about properties of the deviations given in Section 4.1, 
including independence, homogeneity of variances and a Normal probability distribution, 
again apply to this model. As the treatment and block effects are expressed as differences 
from the overall population mean, it follows that they require the constraints 2yXy = 0 and 
= 0. This is the sum-to-zero parameterization introduced in Section 4.5. Note that we 
use an italic Roman symbol (b) to denote block effects to emphasize that they are part of 
the structural component; we reserve Greek symbols (x) to denote treatment effects in the 
explanatory component. 

Labelling the units by their block and treatment allocation retains the simplicity of the 
notation as introduced for the CRD (Section 4.1), but again, information is lost with regard 
to the experimental layout, i.e. the randomized allocation of treatments to plots within 
blocks. We might alternatively write the model in terms of the block, plot and treatment 
relevant to each unit, but this extended notation is both more cumbersome and introduces 
an element of redundancy (given the design, we do not need to know both the plot number 
and the treatment applied to that plot). Therefore, we continue to use the simpler notation, 
but restate the importance of retaining all information in a data set, i.e. factors defining the 
blocks and units within blocks, as well as the treatment allocation, so that the full layout 
can be reconstructed when required. 
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Using our symbolic notation, we can write the model from Equation 7.1 as 

Response variable: Y 

Explanatory component: [1] + Treatment 
Structural component: Block/Unit 

where the variate Y holds the observed response, factor Treatment gives the allocation 
of observations fo freafmenf groups, factor Block gives fhe allocation of observations fo 
blocks and facfor Unit labels the units within blocks. The term Block. Unit is associated 
with the deviations, e,y. The term [1 ] was introduced in Section 4.5, and denotes a factor that 
takes value 1 ever 5 rwhere, and is associated with the overall population mean, p. 

The model in Equation 7.1 is based on the assumption that the expected difference 
between any two treatments is the same in each block. In statistical parlance, we say that 
there is no interaction between blocks and treatments (the term 'interaction' is discussed in 
detail in Section 8.2), meaning that the treatment differences are independent of the block 
effects. In some cases, this assumption is not reasonable. Eor example, if blocks in a field 
experiment are assigned according to the soil's pH, and the treatments are expected to 
react differently according to the soil's pH, then strictly, this model - and hence this design 
- is inappropriate for the experiment (although see the remarks at the end of Section 7.3). 
An alternative design for this situation might have two replicates of each treatment within 
each block, allowing for estimation of treatment effects, effects of soil pH and the interac- 
tion between these factors, but a full consideration of this design approach is beyond the 
scope of this book (see, e.g. Mead et al., 2012, Chapter 7). 

EXAMPLE 7.1A: POTATO YIELDS* 

A field experiment designed as a RCBD to investigate the effects of four different types 
of fungicides (El, F2, F3 and F4) on the yield of potatoes compared with untreated plots 
(negative control) was described in Example 3.5. The experiment was laid out as four 
blocks (n = 4) of five plots each (t = 5) with 20 units in total (N = 20). The plot yields are 
shown in field layout in Table 7.1 and can also be found in the file potato.dat, which 
contains blocking factors Block (four levels) and Plot (five levels), treatment factor 
Fungicide (five levels, with labels 1 ... 5 corresponding to the control, FI, F2, F3 and F4 
treatments, respectively) and response variate Yield. 

The model for these data can be written in the mathematical form of Equation 7.1 as 

Yieldjj = |T + Blockj + Fungicidej + , 



TABLE 7.1 

Field Layout for the Potato Yields Trial with Potato Yield for Each 
Plot (Example ZIA and File potato.dat) 





Plotl 


Plot 2 


Plots 


Plot 4 


Plots 


Block 1 


F3 


Control 


F2 


FI 


F4 




642 


377 


633 


527 


623 


Block 2 


F2 


Control 


F3 


F4 


FI 




600 


408 


708 


550 


604 


Block 3 


Control 


F2 


F3 


F4 


FI 




500 


650 


662 


562 


606 


Block 4 


F3 


F2 


FI 


Control 


F4 




504 


567 


533 


333 


667 
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with Yieldjj representing the yield from the /th treatment in the ith block, Fungicidej 
(j = 1 ... 5) representing the effects of control and fungicides FI, F2, F3, F4, respectively, 
and B/ocfcj (1 = 1 ... 4), representing the effects of the four blocks. As above, |t represents 
the overall population mean and is the deviation for the ;th treatment in the 1th block. 
In symbolic form, this is written as 

Response variable; Yield 

Explanatory component: [1] + Fungicide 

Structural component: Block/Plot 



7.2 Estimating the Model Parameters 

The parameters associated with the RCBD model in Equation 7.1 are the overall population 
mean, p, the treatment effects Ty, ; = 1 ... t and the block effects 1 = 1 . . . n. The fitted value 

for fhe ijth observation, i.e. for the ;th treatment in the 1th block, denoted by y,y, consists 
of all componenfs of fhe model excepf fhe deviations, wifh paramefers replaced by their 
estimates, so that 



y,j = |I + b; + . 

We again estimate the parameters using the principle of leasf squares (see Secfion 4.2), 
by minimizing the residual sum of squares (ResSS) 

n t n t 

ResSS = - y/ = - h - - X;)" , 

i=l j=l i=l ;=1 

subjecf fo the constraints on the parameters, 2, bi = 0 and = 0. As a result of fhis pro- 
cedure, the overall population mean, p, for dafa from a RCBD is estimafed by fhe sample 
grand mean. 



A = y . 

and fhe effecf of fhe/fh freafmenf is esfimafed by the difference befween fhe sample mean 
for fhaf freafment and fhe sample grand mean, 

X, = y-i - y , 

wifh fhe dot notation as introduced in Section 2.1. Similarly, the effect of fhe 1th block is 
estimated by the difference befween fhe sample mean for fhaf block and fhe sample grand 
mean. 



bi = y,. - y . 
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The population mean for the jih treatment, denoted as Py, is then estimated as the sum of 
the estimates of the overall mean and the treatment effect, as 

|i; = |i + X; = y + (y.; - y) = y.j , 

i.e. the sample mean for the ;th treatment. Similarly, the fitted value for each observation 
can be calculated as the sum of the estimates for all components of the model except the 
deviations. 



y,; = A + k + Tj = y + (y„ - y) + {y.j - y) = yi. + y.j - y . 

Finally, the simple residuals are calculated as the discrepancy between the observations 
and the fitted values, namely 

e.; = y<; - hi = Vii - y- - y-i + y ■ 



EXAMPLE 7.1B: POTATO YIELDS* 

Table 7.2 lists the plot yields classified by blocks and treatments, with the block and 
treatment sample means and the sample grand mean. From these values and the for- 
mulae above, we can calculate the parameter estimates. In particular, the estimated 
population means for the treatments are equal to the treatment sample means given 
in Table 7.2. For example, the population mean for the control (treatment 1) is esti- 
mated as 



Ai = A + Fungicidei = 562.8 + (404.5 - 562.8) = 404.5 . 



It appears that the mean yields for the four fungicide treatments are similar (between 
5675 and 629.0) and all are much greater than that for the untreated control (404.5). 
However, to draw sound conclusions about these differences, information on 
the background variation is required to calculate SEDs and hence LSDs for these 
comparisons. 



TABLE 7.2 

Plot Yields of Potatoes from a RCBD with Block and Treatment Means (Example 7.1B) 



Block 


Control 


FI 


F2 


F3 


F4 


Block Mean (;/<•) 


1 


377 


527 


633 


642 


623 


560.4 


2 


408 


604 


600 


708 


550 


574.0 


3 


500 


606 


650 


662 


562 


596.0 


4 


333 


533 


567 


504 


667 


520.8 


Treatment 
mean (y.j) 


404.5 


567.5 


612.5 


629.0 


600.5 


y = 562.8 
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7.3 Summarizing the Importance of Model Terms 

As for the analysis of data from a CRD, the aim of ANOVA is to partition the total varia- 
tion of the observations, quantified as sums of squares, into several components. For 
the RCBD, there are now two factors classifying the observations, corresponding to the 
blocks and treatments, and the analysis needs to account for the variation due to each 
of these sources. The total variation (TotSS) is therefore partitioned into the variation 
among blocks (BlkSS), the variation among treatments (TrtSS) and the residual variation 
(ResSS), with 



TotSS = BlkSS -I- TrtSS -i- ResSS . (7.2) 

Because of the difference in status between the block and treatment factors, we describe 
this analysis as one-way ANOVA with blocks (rather than two-way ANOVA, which we 
use to indicate the inclusion of two treatment factors; see Chapter 8). This clearly empha- 
sizes the difference in the status of these two components in the model (see Section 7.5). 

As for the CRD (see Equation 4.3), the total sum of squares (TotSS) is calculated as the 
sum, over all observations, of the squared differences between each observation and the 
sample grand mean. 



n t 

TotSS = ■ 

i=l y=l 



The block sum of squares (BlkSS) measures the variation among blocks, and is calculated 
as the sum, over all observations, of the squared differences between the appropriate block 
mean and the sample grand mean, 

n t n 

BlkSS = - yf = - yf ■ 

i=l ;=1 !=1 

Similarly, the treatment sum of squares (TrtSS) is calculated as the sum, over all observa- 
tions, of the squared differences between the appropriate treatment mean and the sample 
grand mean. 



n t t 

TrtSS = - yf = n^iy.j - yf . 

;=i j=\ i=i 

The residual sum of squares (ResSS) is calculated as the sum, over all observations, of the 
squared differences between the observed and fitted values, 

n t n t 

ResSS = ^ ^(y,y - y/ = ^ ^(y, - y„ - y.j + yf , 

1=1 j=l i=l 7=1 
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or, by rearranging Equation 7.2, as a simple subtraction of the block and treatment sums of 
squares from fhe fofal sum of squares, 

ResSS = TofSS - BlkSS - TrfSS . 

Alfernafively, fhe sums of squares can be rewriffen in ferms of fhe sums of squared 
esfimafes 



BlkSS = i^bf, 

i=l 



TrfSS = n'y'zj, 



n t 

ResSS = ■ 

i=l ;=1 



Again, as for fhe CRD, fhere is a corresponding parfifion of fhe fofal degrees of freedom 
(TofDF) info fhose associafed wifh variafion among blocks (BlkDF), fhose associafed wifh 
variafion among freafmenfs (TrfDF) and fhe remainder or residual (ResDF), as 

TofDF = BlkDF + TrfDF + ResDF . (7.3) 

Using fhe recipe developed in Secfion 4.3, we can calculafe fhe df associafed wifh each 
sum of squares. The calculafion of fhe TofSS uses fhe N=nxt observafions (which can be 
described using a model wifh N paramefers) wifh jusf fhe sample grand mean, an esfimafe 
of a single paramefer, used for adjusfmenf; hence 



TofDF = N-l . 



The calculafion of fhe BlkSS uses block means (n values) adjusfed by fhe sample grand 
mean; so 



BlkDF = n-l. 



By a similar argumenf, we have 



TrfDF = f - 1 . 

Rearranging Equafion 7.3, fhe ResDF can be mosf easily obfained by subfracfion as 
ResDF = {N-l)-{n-l)-{t-l)=N-n-t + l. 

Using N = nt, we can also wrife fhis as 

ResDF = nt-n-t + l = {n- l)(f - 1) , 

i.e. one less fhan fhe number of blocks (n - 1) mulfiplied by one less fhan fhe number of 
freafmenfs (f - 1). 

As for fhe ANOVA for dafa from a CRD, we can puf fhe sums of squares onfo a common 
scale by dividing fhem by fheir degrees of freedom fo produce mean squares. The residual 
mean square (ResMS) again provides an esfimafe of fhe background variafion or noise, 
usually denofed as s^. If fhere are no differences befween freafmenfs, fhen confribufions fo 
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the treatment mean square arise from background variation alone and the treatment and 
residual mean squares should be of similar sizes, allowing for sampling variafion. A simi- 
lar argumenf follows for fhe comparison of fhe block mean square with the residual mean 
square if there are no differences befween blocks. We can formalize these comparisons by 
considering the expected value of each of the respective mean squares, which are 

t 

E(TrtMS) = + 

^ y=i 

n 

E(BlkMS) = a" + y bf , 
n - 1 

Z = 1 

E(ResMS) = a" . 



If the true population values of the treatment effects are zero, then the second term in 
the expression for E(TrtMS) is zero and the treatment mean square (TrtMS) has the same 
expected value as the ResMS. Similarly, if the true population values of the block effects 
are zero, then the block mean square (BlkMS) also has the same expected value as the 
ResMS. As for the CRD, this property of the TrtMS is the basis for a test of the null hypoth- 
esis of equal treatmenf population means, written for this parameterization as 

Hq: Ti = X2=... = Xt = 0, 

i.e. that all the treatment effects (the differences from the overall population mean) are 
zero. This is compared with the general alternative hypothesis of some non-zero treatment 
effects. The observed E-statistic is calculated as the variance ratio 

E,_i = TrtMS/ResMS . 

If the null hypothesis is true, then we expect the value of the variance ratio to be close 
to 1 (because the TrtMS and ResMS have the same expected value) and the test statistic 
follows an E-distribution with f - 1 (TrtDE) and {n - l)(f - 1) (ResDE) df. If the observed 
statistic E,_i is larger than the 100(1 - ajth percentile of this E-distribution, then the 
null hypothesis is rejected at significance level a^, and we have evidence of some variation 
among the treatment effects. Alternatively, the observed significance level can be calcu- 
lated as 



P — Prob(F,_j („_j)((_j) > E,_j , 

where F,_i is a random variable with an E-distribution on f - 1 and (n - l)(f - 1) df. 
Using analogous reasoning, we can derive a test statistic for a null hypothesis of the 
block effects being equal to zero, using the ratio BlkMS/ResMS. This ratio also has an 
E-distribution under the null hypothesis, now with n - 1 (BlkDE) and (n - l)(f - 1) (ResDE) 
df. However, we consider that blocks and treatments have a different status within the 
model, with blocks reflecting the experimental structure rather than treatments of interest; 
so, it follows that we might regard tests of hypotheses about block effects differently from 
tests of hypotheses about treatment effects. Treatments are imposed on the experiment 
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TABLE 7.3 

Structure of the ANOVA Table for a RCBD with n Blocks (Factor Block) and t Treatments (Factor 
Treatment), and N=nxt Observations in Total (Observed Significance Levels Omitted) 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


Block 


n-1 


BlkSS 


BlkMS = BlkSS/(n - 1) 


BlkMS/ResMS 


Treatment 


t-i 


TrtSS 


TrtMS = TrtSS/(f - 1) 


TrtMS/ResMS 


Residual 




ResSS 


ResMS = ResSS/(u - l)(t -1) 


Total 


N-1 


TotSS 







specifically to examine the differences between them; so, we always want to test the null 
hypothesis of no treatment differences. Blocks have been used to control inherent varia- 
tion among the experimental units; so, we can use the F-statistic for blocks to evaluate the 
extent of this variation, and to help us design future similar experiments. 

The information about degrees of freedom, sums of squares, mean squares and variance 
ratios is combined to form a simple ANOVA table for a RCBD as shown in Table 7.3. 

EXAMPLE 7.1C: POTATO YIELDS* 

Using the formulae presented earlier for sums of squares and for degrees of freedom, 
the ANOVA table for this trial takes the form shown in Table 7.4. 

At this stage, we should validate the analysis using the residual plots described in 
Section 5.2. A composite set of residual plots using standardized residuals is shown in 
Figure 7.1. The Normal plot shows a reasonably straight line, and the histogram is not 
inconsistent with a Normal distribution (although note that with relatively few observa- 
tions, the histogram provides a fairly poor diagnostic tool here). The fitted values plot 
has a slight suggestion of smaller variances for smaller fitted values, but this is difficult 
to judge because there are few observations in that region of the plot. Note that in the 
RCBD, the fitted values do not directly correspond to the treatment groups (as they do 
in the CRD) because both block and treatment effects contribute to the fitted values. 

We judge these plots to be acceptable and move on to interpret the analysis and draw 
conclusions. 

The observed significance level (P) for the Fungicide variance ratio is obtained by 
comparison of the treatment variance ratio (F412 = 9.576) with the quantiles of the 
F-distribution with 4 (TrtDF) and 12 (ResDF) degrees of freedom. Here, P = 0.001, indi- 
cating that the observed treatment differences are very unlikely fo have happened by 
chance if the null hypothesis is true; hence, we reject the null hypothesis at the 0.1% 
significance level and conclude that there are real differences among the set of treat- 
ment means. 



TABLE 7.4 

ANOVA Table for Potato Yields Trial Set Up as a RCBD with Four Blocks (Factor Block) and 
Five Treatments (Factor Fungicide) (Example 71C) 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


Block 


3 


14,987.20 


4995.73 


1.434 


0.283 


Fungicide 


4 


133,419.20 


33,354.80 


9.576 


0.001 


Residual 


12 


41,796.80 


3483.07 






Total 


19 


190,203.20 
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Fitted value 




Fitted value 




Standardized residual 



Normal quantile 



FIGURE 7.1 

Composite set of residual plots using standardized residuals from the potato yields trial (Example 7.1C). 



The observed significance level associated with the Block variance ratio (F 3 j 2 = 1.434) 
is P = 0.283; so, the test statistic is consistent with the null hypothesis of no block dif- 
ferences. Even so, the BlkMS is larger than the ResMS; so, taking account of the block 
heterogeneity has reduced the ResMS, which in turn increases the precision of treatment 
comparisons. For this particular experiment, the advantage of using a RCBD instead of 
a CRD was small (based on a comparison of the relative sizes of the BlkMS and ResMS). 
However, field trials are notoriously heterogeneous; so, it would be imwise to use this 
result to abandon blocking for future similar experiments - sensible use of blocking still 
provides insurance against unit-to-unit heterogeneity. In other contexts, if prior knowl- 
edge or further experimentation indicated that there was generally little advantage in 
blocking for the type of experiment, then it would be sensible to weigh up the possible 
benefit of blocking against the cost in degrees of freedom - the reduction in ResDF could 
have a detrimental impact on the power of the experiment to detect treatment differ- 
ences of interest. These issues are discussed further in Chapter 10. 

One feature of the RCBD is that the two classifying factors - Treatment and Block - are 
orthogonal or independent. The mathematical definition of orthogonality is beyond the 
scope of this book (see, e.g. Bailey, 2008), but we can give some general intuitive insight 
into this property. If two factors are orthogonal, then the same ANOVA table is obtained 
regardless of the order in which the terms are fitted, and comparisons between the levels 
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of one factor are unaffected by the levels of the other factor. This has the advantage that 
interpretation of the ANOVA table is unambiguous. Examples and consequences of non- 
orthogonality are discussed in Chapter 11. 



Mathematical Aside 7.1 



We can use the form of the model to gain further insight into this idea of orthogonality. 
For the RCBD, the treatment population means are estimated by the treatment sample 
means. We can derive an expression for the treatment sample means from the model for 
individual observations (Equation 7.1) by simply substituting the right-hand side of that 
equation for i/,y in the expression y.j = ^ y ,y, giving 

1 " 

y.j = + X; + e,j) . 

i=l 

Expanding the summation for each term separately simplifies the expression to give 



y.j = p -r 



+ T; , 

n ^ n ^ 



and applying the constraint = 0, which we built into the model, causes the block 

effects to be removed from the expression, leaving 



y.j p + Tj + e.j . 

So, because each treatment occurs once within each block, the treatment means do not 
depend on the block effects, and hence, the treatment means (and estimated treatment 
effects) are independent of (i.e. orthogonal to) the block effects. A similar derivation can 
be obtained for block sample means, and because each block contains one instance of each 
treatment, block means are independent of treatment effects. ■ 

One assumption underlying the RCBD model (Section 7.1) is that there are no interac- 
tions between blocks and treatments, i.e. that the expected treatment differences are the 
same in all blocks. This assumption is required for an unambiguous analysis, as it is not 
possible to separate the block x treatment interaction from the model deviations in this 
design. However, it is technically possible to use the Treatment factor as a substitute for 
the Unit factor in the model specification, because the Block.Treatment combinations also 
uniquely label the full set of observations, and then the residual line in the ANOVA table 
may be labelled as Block.Treatment. We believe that the potential confusion caused by this 
approach makes it imperative to retain and use the full set of structural factors in each data 
set, and this is the reason why we use dummy structural factors where the true allocation 
is not available. In some situations, the presence of a block x treatment interaction cannot 
be discounted and may be expected to be much larger than the deviations from other 
sources (individual variation, measurement error, etc.). In this case, the residual error line 
in the ANOVA table might be legitimately labelled as Block.Treatment, and the Treatment 
variance ratio can then be considered as evaluating the consistency of the treatment effects 
across the set of conditions represented by the blocks. 
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7.4 Evaluating the Response to Treatments 

The best estimate of the population mean for fhe ;fh freafmenf (denofed as was idenfi- 
fied in Secfion 7.2 as fhe freafmenf sample mean, i.e. = y.j. Uncerfainfy associafed wifh 
fhis esfimafe is measured by ifs esfimafed SE, fhe SEM, 

SEM = SE(|i,) = 

wifh fhe esfimafe of background variafion, = ResMS, in place of fhe unknown f rue value. 
In fhis case, as all fhe freafmenfs have equal replicafion, fheir SEs are also equal. As for 
fhe CRD, a 100(1 - cxj% confidence inferval for fhe populafion mean of fhe ;fh freafmenf 
can be calculafed as 




(p, - _i)X SEM] , p^.+ _i)X SEM] ) , 

where f{“Ly(t_i) is fhe 100(1 - as/2)fh percenfile of fhe f-disfribufion wifh (n - l)(t - 1) df. 

The besf esfimafe of a difference befween fwo freafmenf populafion means is fhe 
difference befween fhe fwo freafmenf sample means, for example, for fhe Jth and kth 
freafmenfs. 



hr hi = y.j - y.k ■ 

The esfimafed sfandard error of fhis difference (denofed SED) fakes fhe form 

SED = SE(p,-p,) = ^, 

and again is fhe same for any pair of freafmenfs. Under fhe null hypofhesis fhaf fhe popu- 
lafion means of fhe jth and kth freafmenfs are equal, i.e. Hg: py = p^, fhe sfafisfic 

_ h,- hk _ y.j - y.k 

in 1)(( 1) ggp , 

has a f-disfribufion wifh degrees of freedom equal fo fhe ResDF, as denofed by ifs sub- 
scripf. This sfafisfic can be compared wifh fhe quanfiles of fhis f-disfribufion fo fesf 
fhe null hypothesis against a one- or two-sided alternative. A 100(1 - aj% confidence 
inferval for fhe difference befween fhe populafion means for fhese freafmenfs can be 
compufed as 



((p, - pO - LSD, (|i; - |ir) + LSD) , 



where LSD = x SED is fhe leasf significant difference af significance level a^. 
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EXAMPLE 7.1D: POTATO YIELDS’^ 

From the ANOVA table obtained earlier (Table 7.4), the background variation is esti- 
mated as = ResMS = 3483.07. The treatment sample means were shown in Table 7.2. 
There are four blocks, n = 4; so, the standard errors of the means are equal to 



SEM = 




= V870.77 = 29.51 , 



with the SED for any pair of treatments equal to 



SED = 




2 X 3483.07 ^ ^1741.53 = 41.73 . 



The residual df is 12; so, for = 0.05, = 2.179 with LSD = 90.93. Confidence inter- 

vals can be derived from these values as shown in Examples 4.1D and E. It is clear 
that each of the fungicide treatments gives a statistically significant improvement in 
yield compared with the untreated control, but there appears to be little real difference 
among the four fungicide treatments. 



7.5 Incorporating Strata: The Multi-Stratum Analysis of Variance 

We use the term structural component to encompass all the structure within the set of 
observations. The structural component therefore includes bofh blocks imposed by fhe 
experimenfer (e.g. unifs wifh similar fime of processing, spafially grouped unifs wifhin 
a field), and any ofher sfrucfure wifhin fhe experimenf (e.g. fhe nesfing of fields wifhin 
farms, or aphids wifhin cages), including fhe sub-sampling of individual unifs (e.g. planfs 
wifhin a field plof) somefimes referred fo as pseudo-replicafion. 

The simple ANOVA fable derived in Secfion 7.3 does nof make any disfincfion befween 
fhe explanatory and sfrucfural componenfs of fhe model. The mulfi-sfrafum ANOVA fable 
is an alfernafive, and more general, form fhaf preserves fhe disfincfion befween terms 
describing fhe underlying sfrucfure of fhe dafa (sfrucfural componenf) and fhose indicafing 
fhe freafmenfs applied (explanatory componenf). Again, fhe formal mafhemafical definifion 
of strata is beyond the scope of fhis book (see, e.g. Bailey, 2008), buf sfrafa can be informally 
regarded as fhe differenf sfrucfural sources of variabilify among fhe experimenfal unifs (see 
also Secfion 3.2). Each term in fhe sfrucfural componenf generates a sfrafum; so, fhe RCBD 
has fwo sfrafa: one corresponding fo variafion befween blocks (generically fhe Block ferm), 
and one corresponding fo variafion befween unifs wifhin blocks (fhe Block. Unit ferm). If 
we sub-sample wifhin unifs, fhen fhis adds anofher sfrafum, as in fhe following example. 

EXAMPLE 7.2A: POTATO YIELDS USING ROW DATA* 

Each plot of the RCBD potato fungicide trial (described in Example 7.1A) consisted of 
four rows, and the means of the measurements from these four rows were used as the 
yield observations in the previous analysis. The individual row yields are also available, 
giving a new data set with 80 observations. The field layout is shown in Table 7.5. 

Guard rows were planted so that edge effects were absent, and row effects were 
expected to be local so that rows can be reasonably regarded as nested within plots. 



142 



Statistical Methods in Biology 



TABLE 7.5 



Field Layout for the Potato Yields Trial Showing Individual Row Yields (Example 
7.2A and File potatorow.dat) 







Plotl 


Plot 2 


Plots 


Plot 4 


Plots 


Block 1 


Treatment 


F3 


Control 


F2 


FI 


F4 




Row 1 


720 


348 


652 


635 


642 




Row 2 


528 


405 


658 


512 


639 




Row 3 


678 


364 


569 


536 


642 




Row 4 


642 


391 


653 


425 


569 


Block 2 


Treatment 


F2 


Control 


F3 


F4 


FI 




Row 1 


554 


411 


682 


639 


583 




Row 2 


618 


374 


741 


544 


530 




Row 3 


621 


396 


712 


521 


629 




Row 4 


607 


451 


697 


496 


674 


Block 3 


Treatment 


Control 


F2 


F3 


F4 


FI 




Row 1 


561 


555 


638 


505 


598 




Row 2 


491 


633 


712 


597 


620 




Row 3 


429 


715 


633 


607 


596 




Row 4 


519 


697 


665 


539 


610 


Block 4 


Treatment 


F3 


F2 


FI 


Control 


F4 




Row 1 


451 


513 


441 


367 


631 




Row 2 


493 


626 


467 


319 


618 




Row 3 


535 


574 


701 


361 


689 




Row 4 


537 


555 


523 


285 


730 



To fully describe the structure of fhis data set, a new factor Row is required to specify 
the allocation of observations to rows within plots, in addition to the factors Block and 
Plot. A model for these data can be written as 

RmvYieldji^ = h + Blocks + Fungicidej + Block.Plotjj + , 

where RoivYieldjj^. is the yield obtained from the fcth row (fc = 1 ... 4) in the plot with the 
jth treatment (/ = 1 ... 5) in the ith block (i = 1 . . . 4). The overall mean, p, is now the popu- 
lation mean with respect to row yields, Blocks is the effect of the ith block and BlochPlot^j 
is the effect of the plot with the /th treatment in the ith block. The deviations, now 
correspond to observations on rows nested within plots, which are in turn nested within 
blocks. This three-level nested structure is denoted in symbolic form as 

Sfrucfural component: Block/Plot/Row 
which can be expanded to individual model terms as 

Structural component: Block + Block. Plot + Block. Plot. Row 

In this case, there are three strata, which correspond to blocks (Block), plots within 
blocks (Block. Plot) and rows within plots (Block. Plot. Row), with the latter term corre- 
sponding to the model deviations. This illustrates a case where the blocking structure 
reflects both blocks imposed by the experimenter and other structure, in this case, the 
presence of rows within plots. 
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TABLE 7.6 

Structure of the Multi-stratum ANOVA Table for a RCBD with n Blocks (Factor Block), t Units per 
Block (Factor Unit), t Treatments (Factor Treatment) and N = nxt Observations in Total 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


Block stratum 


Residual 


n-1 


BlkSS 


BlkMS = BlkSS/(n - 1) 


BlkMS/ResMS 


Block.Unit stratum 


Treatment 




TrtSS 


TrtMS = TrtSS/(f - 1) 


TrtMS/ResMS 


Residual 

Total 


(u-l)(f-l) 

N-1 


ResSS 

TotSS 


ResMS = ResSS/(n - l)(f - 1) 





The multi-stratum ANOVA approach results in an ANOVA table with separate compo- 
nents for each of fhe sfrafa defined by fhe blocking sfrucfure. The variafion wifhin each 
sfrafum (i.e. af each level of fhe design) is fhen parfifioned info sums of squares associ- 
afed wifh fhe freafmenfs fhaf vary befween unifs af fhaf level of fhe design (if any) and a 
residual ferm. For example, fhe simple ANOVA fable for fhe RCBD can be rewriffen in fhe 
form shown in Table 7.6. 

This ANOVA fable has fwo sfrafa, corresponding fo fhe Block and Block. Unit ferms in 
fhe sfrucfural componenf. The dafa in fhe Block sfrafum correspond fo block fofals calcu- 
lafed affer subfracfion of (i.e. adjusfmenf for) fhe sample grand mean. This is somefimes 
referred fo as inter-block information. Since every treatment has been applied once in each 
block, block differences cannof be affribufed fo freafmenf differences (see below for a more 
mafhemafical argumenf). So, variafion in fhe Block sfrafum consisfs of only a residual 
ferm represenfing fhe background variafion befween blocks - which is exacfly fhe same 
inferprefafion of fhe block sum of squares (BlkSS) as seen earlier. The dafa in fhe Block. 
Unit stratum correspond to the original observations adjusted for the relevant block means, 
sometimes referred to as intra-block information. In this stratum, every unit within a block 
has a treatment different from others, and so, variation between units includes variation 
due to treatments; hence, some variation within this stratum can be attributed to treatment 
differences. The Block. Unit stratum variation is thus partitioned as variation due to treat- 
ments (TrtSS) plus residual variation (ResSS). 



Mathematical Aside 7.2 

To establish in which strata different treatment effects are estimated, you should consider 
the form of data within each stratum of the design. In general, data at the top level of a 
structure correspond to unit totals within that stratum, calculated after subtraction of the 
sample grand mean; for example, for the RCBD, data in the Block stratum are the mean- 
adjusted block totals, calculated as 

- - 1 

-y) = yi- -ty = y- - ^ y- ton = i...n. 

At the next level down, data again correspond to unit totals within the stratum, calcu- 
lated after subtraction of the means corresponding to units in higher strata; for example. 
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for the RCBD, data in the Block. Unit stratum are the observations adjusted for the block 
means 



1 

Vij - y- = yij--^yi-' forz = i ... n,j = i ... t. 

The data in each stratum can then be written algebraically in terms of model parameters 
by substitution of the model formula in place of the response values. When treatment 
effects are present in the algebraic expression for data within a particular stratum, it fol- 
lows that there is information on these treatments within that stratum. 

For the RCBD, the underlying model is given in Equation 7.1, and the expression for the 
mean-adjusted block totals can therefore be rewritten, after substitutions based on this 
model, as 






1 



j=i ,=i 



tp + th + XX' 



;=1 ,=1 ;=1 



= thi + Ci. e.. , 



with the inbuilt model constraints X,h, = 0 and ZyXy = 0 used to simplify the expression. This 
expression does not involve the treatment effects, and so, this stratum does not contain 
information about treatment differences. Variation at this level is related to block effects 
and the deviations. Similarly, the adjusted observations in the lower stratum (units within 
blocks) can be written as 



f t ^ 

tp + thi XwX-, 

V Z=1 M 

1 

— Tj + ' 



1 1 

= {\i + b, + Xj + e,j) - - 



again with the constraints used to simplify the expression. Units within a block clearly 
have different treatments applied, and this expression confirms that unit differences do 
hold information on treatment differences. ■ 



The multi-stratum ANOVA table for the RCBD rearranges the simpler form to reflect the 
structure of the experiment. However, the great advantage of the multi-stratum ANOVA 
is the recognition of the interplay between the blocking and treatment structures so that 
treatment effects are always allocated to the correct strata, and an appropriate measure 
of precision can be calculated for the comparison of treatment means, with the correct 
degrees of freedom. One example where this can be particularly important is where 
pseudo-replication is present in the structure (see Section 3.1.1), for example, where sub- 
sampling or technical replication (several measurements per experimental unit) have been 
used to reduce measurement error. An example of this situation is presented below. 



EXAMPLE 7.2B: POTATO YIELDS USING ROW DATA* 

Here, we analyse the individual row data (see Example 7.2A) as presented in Table 7.5. 
The full data set, including classifying factors Block, Plot and Row, can also be found 
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in file potatorow.dat. The multi-stratum ANOVA table for these data, corresponds to 
the model 

Response variable; RowYield 

Explanatory component: [1] + Fungicide 

Structural component: Block/Plot/Row 

This ANOVA table is in Table 7.7 and has three strata corresponding to blocks (Block), 
plots within blocks (Block. Plot) and rows within plots (Block. Plot. Row). Treatments 
(factor Fungicide) are applied to plots; so, treatment differences are estimated from 
the differences between plots, and the TrtSS is a component of the variability within the 
Block. Plot stratum. The Block, Fungicide and Block. Plot residual sums of squares are 
equal to four times the BlkSS, TrtSS and ResSS from the analysis of plot means given 
in Example 7.1C. This multiplication by four for each term is due to the presence of four 
observations from each plot (i.e. from the four separate rows). As the degrees of freedom 
in the Block and Block. Plot strata are the same as in Table 7.4, the variance ratios for 
the Block residual and Fungicide terms are preserved. The conclusions with respect to 
the treatments are thus unchanged. Since treatments are applied to plots, the Block. Plot 
residual mean square (and not the Block. Plot. Row residual mean square) is the appropri- 
ate measure of background variation for estimates of Fungicide SEMs, SEDs and LSDs, 
with degrees of freedom equal to the residual df from the Block. Plot stratum. 

As an illustration of the importance of specifying the correct blocking structure, sup- 
pose that the presence of sub-sampling was ignored, and that only the block and treat- 
ment factors (Block and Fungicide) were specified in the analysis. This would lead to 
the simple ANOVA presented in Table 7.8. 



TABLE 7.7 

Multi-stratum ANOVA Table for Potato Yields Trial Using Yields from Four Rows (Factor Row) 
per Plot (Factor Plot) (Example 7.2B) 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


Block stratum 
Residual 


3 


59,948.80 


19,982.93 


1.434 


0.283 


Block.Plot stratum 
Fungicide 


4 


533,676.80 


133,419.20 


9.576 


0.001 


Residual 


12 


167,187.20 


13,932.27 


4.474 


< 0.001 


Block.Plot. Row 
stratum 
Residual 
Total 


60 

79 


186,848.00 

947,660.80 


3114.13 







TABLE 7.8 

Incorrect ANOVA Table (Ignoring Strata) for Potato Yields Trial Using Individual Row 
Yields (Example 7.2B) 


Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


Block 


3 


59,948.80 


19,982.93 


4.064 


0.010 


Fungicide 


4 


533,676.80 


133,419.20 


27.133 


< 0.001 


Residual 


72 


354,035.20 


4917.16 






Total 


79 


947,660.80 
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Omission of information on the full structure (rows within plots within blocks) means 
that the two lower levels of background variation (due to plots within blocks, and rows 
within plots) cannot be separated and are combined in the analysis. This leads to an 
estimate of the background variation to be used for treatment comparisons that is much 
smaller than it should be (a residual mean square of 4917 rather than 13,932), with many 
more residual degrees of freedom (72 instead of 12). This is typical of a situation where 
pseudo-replication is ignored. Hence, compared with the correct analysis, the Block 
and Fungicide variance ratios are inflated, and treatment SEMs, SEDs and LSDs are 
greatly underestimated, leading to incorrect inferences. 

In multi-stratum ANOVA tables, it is possible to test the null hypotheses associated with 
the structural terms using comparisons between nested strata. For example, in Example 
7.2B, we can test whether there is any evidence of non-zero plot effects by comparing the 
residual mean square from the Block. Plot stratum with that from the Block. Plot. Row stra- 
tum. As stated previously, this is usually a side issue in the analysis, although the informa- 
tion may be useful when designing further experiments. 

We take the opportunity to restate here that to preserve the distinction between the 
explanatory and structural components, it is necessary to store factors associated with both 
of fhese components within a data set. Although it is often possible to obtain the correct 
analysis without doing this, we believe that this loses information on the exact experimen- 
tal layout and can lead to urmecessary confusion (see commenfs af fhe end of Section 7.3). 

Unfortunately, the multi-stratum ANOVA table can be formed only when the explanatory 
and structural components obey certain conditions of balance, and fhe details are further 
discussed in Chapters 9, 11 and 16. The simplest case occurs when block and treatment 
factors are orthogonal as in the RCBD (see Section 7.3), so that each term can be estimated 
independently of fhe other. 



EXERCISES 

7.1* A controlled environment experiment to compare the effect of a dief on weighf 
of three aphid species was conducted using a RCBD with three blocks. 

a. What are the null and alternative hypotheses for this experiment? 

b. Construct the ANOVA table given that BlkSS = 0.00317, TrtSS = 0.35106 and 
TotSS = 0.36195. 

c. What is the appropriate F-distribution for fhe freafmenf variance rafio under 
fhe null hypofhesis? Whaf is fhe 5% crifical value from fhis disfribufion? 

d. Would we accepf or reject the null hypothesis? 

7.2* A field frial to test the response of a crop to five fertilizer treatments (0, 50, 100, 
150 and 200 kg/ha of N) was designed as a RCBD wifh four blocks of five plots 
(factors Block and Plot, respectively). The yield at harvest was recorded for each 
plof. The file fertilizer.dat confains the unit numbers (ID), structural factors 
(Block, Plot), applied rates of N (facfor N) and fhe yields (variafe Yield). 

a. Wrife down a mafhemafical model for fhe yields. 

b. Consfrucf a mulfi-sfrafum ANOVA fable by calculafing fhe fofal, block, 
freafmenf and residual sums of squares and df and fhen deriving fhe ofher 
columns. Is fhere any evidence of differences in yield among fhe fertilizer 
treatments? 
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c. Calculate the estimated mean for each treatment. 

d. Calculate the LSD at the 5% level for fhe difference befween any fwo freaf- 
menf means and use if fo compare fhe yields obfained for 150 and 200 kg of 
N applied. 

e. Use residual plofs fo check whefher fhe model assumpfions are reasonable. 

f. Wrife a shorf summary of fhe resulfs of fhe analysis. 

7.3 A 2-year field experimenf invesfigafed fhe effecfs of soil culfivafion on fhe acfiv- 
ify of beneficial arfhropods. Plofs of winfer oilseed rape were laid ouf as a RCBD 
wifh five blocks of fhree plofs. Three soil culfivafion freafmenfs were fo be com- 
pared: ploughing in bofh years, minimum fillage in bofh years and minimum 
fillage in year 1 followed by ploughing in year 2. We consider dafa from fhe 
firsf season, when fhe laffer fwo freafmenfs were equivalenf resulfing in fwo 
firsf-year freafmenfs 'plough' (mj = 5, one plof per block) and 'minimum fillage' 
(w 2 = 10, two plots per block). The accumulated catch of fhree piffall fraps per 
plof during a 3-monfh period was recorded for various arfhropod species; here, 
we analyse counfs of spiders of fhe faxa Oedothorax. The plof-level unif numbers 
(ID), sfrucfural facfors (Block, Plot), freafmenfs applied (facfor Treatment) and 
the total count data (variate PlotCount) can be found in the file oedoplot.dat.* * 

a. Use multi-stratum ANOVA to determine whether these soil cultivation 
methods affect spider numbers. Obtain the standard errors for each treat- 
ment mean and the standard error of the difference between the two means 
(you will need to take into account the differing replication, as in Section 
4.4). Produce and interpret a composite set of residual plots. 

b. The trap-level unit numbers (ID), structural factors (Block, Plot and Trap), 
treatments applied (factor Treatment) and individual counts from the 
three pitfall traps in each plot (variate TrapCount) can be found in file 
OEDOTRAP.DAT. Obtain the multi-stratum ANOVA table and residual plots 
for these data. Compare and contrast your results here with those obtained 
in part (a) and discuss any differences. 

7.4 A controlled environment experiment investigated the impact of inoculation 
rate on leaf symptoms in oilseed rape. Four rates of inoculation were chosen 
(0.4, 4, 40 and 400) and each rate was tested on six oilseed rape cultivars. The 
experiment was carried out in three runs. In each run (or occasion), the 24 
treatments were randomly allocated to 24 single plants in pots, and the aver- 
age percentage area of leaf infected from two leaves per plant was recorded. 
The unit numbers {ID), structural factors (Occasion, Pot), explanatory factor 
(Treatment) and responses (variate Plnfected) can be found in file inocula- 
TiON.DAT. Analyse these data on an appropriate scale using ANOVA accounting 
for blocks and the set of 24 treatments. Is there any evidence that the area of leaf 
infected differs among 24 treatments? (We re-visit these data in Exercise 8.1.)^ 



Data from A. Ferguson, Rothamsted Research. 

* Data from N. Evans, Rothamsted Research. 
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In previous chapters, where the principles of designing experiments and several different 
designs were introduced, the focus was on undersfanding and specifying fhe sfrucfure of 
fhe experimenfal unifs, i.e. fhe sfrucfural componenf of fhe model. In all of fhese sifuafions, 
fhe explanatory componenf of fhe model, consisfing of fwo or more freafmenf groups, has 
been represenfed using a single facfor. Tesfing fhe null hypofhesis of ANOVA allows us fo 
answer fhe broad quesfion of whefher fhe freafmenfs differ from one anofher, buf usually 
we are inferesfed in more sfrucfured comparisons befween freafmenfs, and invesfigafing 
fhese is fhe subjecf of fhis chapfer. 

The process of freafmenf selecfion begins wifh specificafion of fhe biological quesfions 
fo be answered or h 5 q)ofheses fo be fesfed by fhe experimenf, which in fum suggesfs a sef 
of experimenfal freafmenfs. Comparisons befween fhese freafmenfs can be fumed info sfa- 
fisfical h 5 q)ofheses, so fhaf if is clear which quesfions can be answered by sfafisfical analy- 
ses. Often several differenf sefs of freafmenfs mighf be considered, each of which enables 
slighfly differenf quesfions fo be answered, and sfafisfical considerations such as efficiency 
and precision can help fo choose befween fhe differenf sefs. The role of sfafisfical evalua- 
tion is fherefore imporfanf even during fhe preliminary plarming sfages of an experimenf. 
If is also helpful fo realize fhaf if is possible, and usually desirable, fo address more fhan 
one hypofhesis wifhin a single experimenf, and fo appreciafe fhaf differenf aspecfs of a 
sfafisfical analysis will be appropriate fo address differenf f)q)es of quesfion. For example, 
consider an experimenf sef up as a RCBD fo compare fhe effecfs of fhree increasing doses of 
growfh regulator wifh a confrol freafmenf (no regulafor applied). The quesfion of whefher 
growfh regulafor affecfs yield can be direcfly answered by an F-fesf from an ANOVA fable, 
buf fhe more imporfanf quesfion of how growfh regulafor affecfs yield is besf addressed 
by examinafion of fhe pattern of response fo dose. 

In fhis chapfer, we examine ways of franslafing quesfions abouf a sef of freafmenfs info a 
sfafisfical analysis. Here, we do nof emphasize fhe sfrucfure of fhe experimenf, buf remem- 
ber fhroughouf fhaf specificafion of fhe correcf sfrucfural model is required fo obfain fhe 
correcf analysis. The sfrucfure of fhis chapfer is summarized in Table 8.1. 

This chapfer begins wifh an overview of several common f 5 q)es of quesfion and fhe cor- 
responding sfrucfure used for fheir sfafisfical analysis (Secfion 8.1). We are often inferesfed 
in comparisons relafing fo disfincf factors underlying fhe sef of freafmenfs. The complete 
definition, analysis and inferprefafion of a crossed sfrucfure for fwo factors are described 
(Secfion 8.2) and fhen extended fo fhree or more factors wifh fhe emphasis on inferprefafion 
(Secfion 8.3). In some circumsfances, a nested sfrucfure is more appropriate (Secfion 8.4), and 
fhis sfrucfure can also be used fo allow for fhe presence of confrol or sfandard freafmenfs 
(Secfion 8.5). Often, specific freafmenf comparisons are required fo answer scienfific ques- 
fions. These comparisons may be incorporated in fhe analysis via freafmenf confrasfs (Secfion 
8.6), which can also be used fo model patterns of response for factors wifh quanfifafive values 
(Secfion 8.7). Following analysis, if can be useful fo make comparisons befween freafmenfs 
based on fables of predicted means. We describe some mefhods for making specific f 5 q)es of 
comparisons and fhen discuss some issues associated wifh fhis approach (Secfion 8.8). 
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TABLE 8.1 

Location of Topics Discussed in Chapter 8 
Section Topic 

8.1 Overview 

8.2 Crossed explanatory structure for two factors 

8.3 Crossed explanatory structure with three or more factors 

8.4 Nested explanatory structure 

8.5 Nested structure to account for control or standard treatments 

8.6 Use of contrasts to make specific comparisons 

8.7 Use of polynomial contrasts to model response to quantitative factors 

8.8 Making treatment comparisons from predicted values 



8.1 From Scientific Questions to the Treatment Structure 

The structure of the treatments in an experiment should relate to the scientific questions 
addressed. Many experimenfs are concerned wifh assessing how several differenf fypes of 
freafmenf affecf fhe response of a biological sysfem. For example, in an experimenf fo invesfi- 
gafe storage conditions for pofafoes, fhe inferesf mighf be in esfablishing fhe effecf of differenf 
femperafures (3°C, 5°C or 7°C) and humidifies (low, 92.5%, and normal, 95%) on subsequenf 
frying qualify. The femperafures and humidifies can be considered as two different t 5 qies 
of freafmenfs fhaf have been combined fogefher. Af fhis poinf, if is helpful fo clarify some 
terminology. A treatment factor is a group of freafmenfs of a common fype, for example, in 
fhe above experimenf, fhe femperafures correspond fo one freafmenf factor, and fhe humid- 
ify fo a second freafmenf factor. The factor levels correspond to the groups labelled by each 
treatment factor (e.g. the three individual temperatures), and an experimental treatment is a 
combination made by taking one level from each of fhe freafmenf factors used in fhe experi- 
menf (e.g. femperafure 5°C wifh 95% humidify). A factorial treatment structure consists of 
all possible experimenfal freafmenfs consfrucfed by faking one level from each of fhe freaf- 
menf factors. So in fhe example above, a factorial sfrucfure consisfs of six freafmenfs: all fhree 
femperafures tested af bofh humidify levels. This sfrucfure is often called a 3 x 2 factorial, i.e. 
a factorial sfrucfure wifh two factors, one wifh fhree levels and fhe ofher wifh fwo levels. This 
is also sometimes referred fo as a fwo-way sfrucfure, i.e. a sfrucfure wifh fwo facfors, leading 
fo a fwo-way ANOVA. The concepf of a factorial sfrucfure can be extended fo any number 
of facfors (an r-way sfrucfure for r facfors). For example, a 3 x 3 x 2 factorial confains fhree 
facfors, fwo wifh fhree levels and one wifh fwo levels, giving a fhree-way sfrucfure wifh 18 
experimenfal freafmenfs. For fhe momenf, we consider fwo freafmenf facfors only. 

If an experimenf has a factorial freafmenf sfrucfure fhen usually fhe scienfific quesfions 
relafe fo bofh fhe overall effecfs of each freafmenf factor, and whefher fhe differenf freaf- 
menf facfors acf independenfly or inferacf. This requires fhe use of a crossed sfrucfure 
(see Secfion 3.2), which is expressed wifh fhe explanatory componenf of fhe model. The 
pofafo storage experimenf described above has fhis sfrucfure: bofh fhe femperafure and 
humidify freafmenfs are of individual inferesf, as is fhe quesfion of whefher changing fhe 
femperafure also changes, or inferacfs wifh, fhe effecf of fhe humidify freafmenf. Using fhe 
obvious symbolic names, we can wrife fhis sfrucfure as 

Explanatory componenf: [1] + Temperature*Humidity 

= [1] + Temperature + Humidity + Temperature.Humidity 
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Recall that [1] represents a factor with only one group and that this term is associated with 
the overall mean, p. In this context, the terms Temperature and Humidity are called main 
effects and the term Temperature. Humidity represents their interaction. This crossed treat- 
ment structure is explained in detail in Sections 8.2 and 8.3. 

Occasionally, a nested treatment structure will be more appropriate (see Section 3.2). 
This happens when the levels of one freafmenf factor have no real meaning when con- 
sidered alone, buf have meaning when considered in conjuncfion wifh anofher freafmenf 
facfor. In fhis case, fhe fwo factors are nof independent and we denote fhis hierarchical 
relafionship in terms of a parenf facfor and a nested facfor. For example, consider a small 
variefy frial designed to fesf four families each wifh six lines. This can be considered as 
a 4 X 6 factorial sfrucfure for factors Family and Line, wifh lines numbered 1-6 wifhin 
families. Buf fhere is no connecfion befween fhe nested lines wifh label 1 (or 2 or ... 6) 
across families and so fhere is no inferesf in fhe overall effecf of each line number. There is 
inferesf in fhe overall effecf of each family, as fhese consfifufe separate groups, and also in 
whefher fhere are differences befween lines wifhin families. Hence, fhe appropriate freaf- 
menf sfrucfure fakes fhe form 

Explanatory componenf : [1]-i-Family/Line 

= [1] -I- Family -i- Family.Line 

Here, fhere is only one main effecf, for fhe parenf facfor Family, and fhe ferm Family.Line 
represenfs fhe nested effecfs of lines wifhin families. This nested freafmenf sfrucfure is 
explained furfher in Secfion 8.4. The nesfed sfrucfure can also be useful when confrol or 
sfandard freafmenfs are included wifhin fhe sef of experimenfal freafmenfs, buf direcf 
comparison wifh ofher freafmenfs is nof of major inferesf, or when a confrol or sfandard 
is added onto a factorial sef. In fhis case, a nesfed sfrucfure can be used fo parfifion fhe 
experimenfal freafmenfs info sefs, and comparisons of inferesf are fhen made across and 
wifhin fhe sefs. This sfrucfure is explained furfher in Secfion 8.5. 

If fhe scienfific quesfions do nof relafe direcfly fo an underlying crossed or nesfed sfruc- 
fure, fhen fesfing specific hypofheses abouf differences befween freafmenf effecfs, known 
as confrasfs, can often be an efficienf approach. For example, consider a repellence screen- 
ing experimenf in which four compounds were fesfed: sfandard, A alone, A wifh B, A wifh 
C. The quesfions of inferesf are 'is fhe compound A as good as or beffer fhan fhe sfandard?' 
and 'are eifher of fhe combinafions A -i- B or A -i- C beffer fhan A alone?' We can consfrucf 
confrasfs fo incorporate fhese hypofheses info our analysis, and fhis maffer is examined 
in more defail in Secfion 8.6. Confrasfs can also be used fo make specific comparisons 
wifhin a crossed or nesfed sfrucfure. As an alfernafive fo embedding confrasfs wifhin fhe 
analysis, we can apply confrasfs fo fables of predicfed means after fhe analysis, and fhis is 
discussed in Secfion 8.8. 

Finally, if one or more of fhe freafmenf facfors have levels related fo an rmderlying quan- 
fifafive scale, for example, amounf of terfilizer applied, sowing date, dose of chemical or 
femperafure, fhen if may be desirable fo model fhe response on fhaf quanfifafive scale. 
Pol 5 momial confrasfs can be used fo build simple empirical models for fhe response, and 
fhis can also be done wifhin a crossed or nesfed freafmenf sfrucfure. For example, consider 
a field frial fo investigate facfors affecting wheaf esfablishmenf fhaf fesfs five sowing dates 
(each fwo weeks aparf) for six difterenf variefies. This is a 5 x 6 facforial experimenf, wifh fhe 
appropriate explanatory model being a crossed freafmenf sfrucfure. Pol 5 momial confrasfs 
can be used fo model fhe response as a linear or quadratic function of sowing date, and fo fesf 
whefher fhis frmcfion is consisfenf across variefies. This approach is illusfrafed in Secfion 8.7. 
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8.2 A Crossed Treatment Structure with Two Factors 

A crossed treatment structure for two factors allows variation within the full set of treat- 
ments to be partitioned into the overall (or main) effects of the individual factors and the 
interaction between those factors. The main effects for a factor represent the common 
effect for each of its levels, when averaged over all levels of the other factor. For example, 
consider a controlled environment (CE) experiment to investigate the infectivity of four 
different strains of a pathogen species (labelled A-D), for artificially damaged (wounded) 
or unwounded leaves. This is a 4 x 2 factorial structure and both treatment factors are to 
be evaluated individually, as well as in combination, so a crossed treatment structure is 
appropriate here. The main effect for each pathogen strain represents an average effect 
across wounded and unwounded leaves. Conversely, the main effect of wounding repre- 
sents an average effect across all strains. 

Two factors are considered to act independently if the effect of applying them together is 
equivalent to adding together their main effects; any deviation from this pattern is known 
as an interaction. The concept of interaction is most easily represented in a graphical con- 
text, and we illustrate it using the CE experiment with different pathogen strains and 
wounding, where the response is a measure of pathogen growth within the leaf. Here, 
we consider the true, but unknown, response (rather than observed responses) in terms 
of population parameters, in the context of two different scenarios. The first scenario 
assumes independent action of the two factors and is represented in Eigure 8.1a. 

The main effect of each pathogen strain is based on the average of its growth across the 
wounded and unwounded treatments and it is clear that there is some difference in aver- 
age virulence among the strains. The individual main effects are considered as deviations 
from the overall mean, so that the main effect for strain A is a small negative value, for B 
is a large negative value, and both C and D are positive, with C larger than D. Similarly, 
the main effect for unwounded plants is based on the average growth across strains for 
this condition, expressed as a difference with the overall mean (represented at the right- 
hand side of Eigure 8.1a). The main effect for unwounded plants is negative and that for 
wounded plants is positive, i.e. pathogen growth is greater in woimded leaves, and the 
absolute values of these two effects are equal. If there is no interaction, then the growth for 
each treatment combination should arise solely from the main effects. This would imply 
that the difference in growth between wounded and unwounded plants should be the 
same for all strains and so the pattern of growth across strains should be similar for both 
wounded and unwounded plants. This is seen most clearly if we draw lines for growth 
across strains in the woimded and unwounded plants separately: if there is no interaction. 





FIGURE 8.1 

Pattern of response for four pathogen strains (A-D) with (o) or without (•) wounding in the case of (a) no interac- 
tion and (b) strong interaction present. • indicates mean for each pathogen strain. 
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then these lines should be parallel, as in Figure 8.1a. This model is variously known as the 
independence, main effects or additive model. 

In the second scenario, we assume the presence of some inferacfion befween fhe fac- 
tors, which is represented in Figure 8.1b. The main effecfs are calculafed and represenfed 
as in fhe previous scenario. Again, fhere are differences in average virulence befween 
fhe sfrains (C largesf, B smallesf wifh A and D intermediate) and more pafhogen growfh 
in wounded fhan in unwounded leaves. However, here if is clear fhaf fhe difference in 
growfh befween fhe wounded and unwounded leaves changes according to fhe sfrain. For 
example, fhe difference is small for sfrains A and C buf much larger for sfrains B and D. 
In consequence, fhe paffern of growfh across sfrains for wounded leaves is subsfanfially 
differenf from fhe paffern for unwounded leaves, and fhe fwo lines are no longer paral- 
lel (Figure 8.1b). The inferacfion effecf for each freafmenf combinafion is fhe difference 
befween fhe growfh expecfed under fhe independence model and fhe acfual value. 

In pracfice, of course, we do nof know fhe frue populafion response and have only a 
sample of observafions for each experimenfal freafmenf, which infroduces variafion. We 
can plof fhe observed freafmenf means to gef some insighf info fhe presence of an infer- 
acfion, buf fhese means are subjecf fo uncerfainfy. As in previous chapfers, we can use 
ANOVA fo obfain an esfimafe of background variafion and use fhis fo judge whefher fhe 
observed inferacfion effecfs are real, or if fhey can be affribufed fo background variafion. 

EXAMPLE 8.1A: BEETLE MATING 

Consider the beetle mating experiment described in Example 6.1 (data in file beetles.dat). 
Females from two species of willow beetle (P. vitellinae and P. vulgatissima) were mated 
with males from either their own species (intraspecies mating) or the other species (inter- 
species mating). There were 10 replicates of each of the four treatment combinations, and 
the number of eggs laid by each female was recorded. Example 6.1 established that a log 
transformation is required to homogenize the variance (we used a logm transformation), 
and considered the structure as a single set of four treatments. Here, we use a crossed 
treatment structure to address the question of interest, namely the viability of interspe- 
cies mating. We represent the four treatments as a factorial combination of two factors, the 
species of the female and the type of mating. 

By using this structure, we partition the variation between treatments into that due to 
each of the two main effects and their interaction. The main effect of species represents the 
logged number of eggs produced by females of each species, averaged across mating types. 
Conversely, the main effect of mating type represents the logged number of eggs produced 
for each type of mating, averaged across species. The interaction examines whether there 
is any difference in logged numbers of eggs between types of mating across females of 
the two species. The observations are plotted with the four treatment means in Figure 8.2. 

It is clear that females of species P. vulgatissima appear more fecund and that interspe- 
cies mating is generally less productive than intraspecies mating. The lines that join 
means for the same mating type across species are not parallel, suggesting that an inter- 
action may be present, and the loss of productivity due to interspecies mating appears 
smaller for species P. vulgatissima than for species P. vitellinae. However, it is also clear 
that there is much variation in the observed logged numbers, and the apparent interac- 
tion must be evaluated in the context of this background variation. 



8.2.1 Models for a Crossed Treatment Structure with Two Factors 

In the previous section, a set of experimental treatments was decomposed into components 
associated with the underlying treatment factors. In this section, we write this decomposi- 
tion in the form of a statistical model. For simplicity, here we assume that the experimental 
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Species of female 



FIGURE 8.2 

Observed productivity, measured as logjo(number of eggs), for inter- (o) and intraspecies (•) mating for two spe- 
cies of willow beetle, with means for inter- (larger o) and intraspecies (larger •) mating joined across species 
(Example 8.1A). 



units are unstructured, i.e. with no structural component, and that the treatments form a 
factorial set with equal replication. A model for fhe observafions in ferms of a single sef of 
freafmenfs can be wriffen as in Equafion 4.6 as 

= h + + (8-1) 

where is fhe observed response for fhe kth replicafe of fhe jth freafmenf group, for 
j =1 ... t and k=l ... n, wifh deviafion p is fhe overall mean and Tj is fhe effecf of fhe 
yfh freafmenf group, expressed as a deviafion from fhe overall mean. The assumptions of 
Section 4.1 all still apply. Recall from Secfion 4.5 fhaf fhis form of fhe model uses fhe sum- 
fo-zero consfrainf = 0 fo avoid over-parameferizafion. If we denofe fhe responses as 
variafe / and fhe facfor labelling fhe freafmenf groups as Treatment, then we can write this 
model in symbolic form as 

Response variable: / 

Explanatory component: [1] -i- Treatment 

The term [1] is associated with the overall mean, and the term Treatment is associated with 
the set of treatment effects, t^, ; = 1 . . . f. 

To write the model in terms of the individual factors, we must relabel the observations 
in terms of those individual factors. In the case of a generic crossed structure constructed 
from two factors, we denote these factors as A and B. Eactor A has levels and factor B 
has fg levels and their product gives the total number of treatments, i.e. t = tj^x tg. The sub- 
scripts r and s are used to indicate the level of factors A and B, respectively, present on each 
unit. The statistical model above can then be rewritten, with this new labelling, as 

2/refc “ ti Ts ^rsk * 

The yth treatment group is now acknowledged as arising from the combination of the 
rth level of treatment factor A with the sth level of treatment factor B but, apart from this 



Extracting Information about Treatments 



155 



cosmetic change, this is exactly the same linear model. This model gives exactly the same 
ANOVA table and estimates as in the previous form but now labelled by the underlying 
factors. Here, the estimate of each freafmenf effecf is fhe deviafion of fhe observed freaf- 
menf mean from fhe sample grand mean, as obfained previously, and can be written as 

yrs‘ y • 

The crossed sfrucfure described above is written in symbolic form as 

Explanatory componenf: [1] + A*B 

= [1] + A+B + A.B 

To implemenf fhis form, we decompose fhe unsfrucfured freafmenf effecfs info main and 
inferacfion effecfs as 



Ts = a, + Ps + (aP)«. (8.2) 

where a, is fhe main effecf for fhe rfh group in factor A, is fhe main effecf for fhe sfh 
group in factor B and (aP)„ is fhe inferacfion effecf for fhe rfh group in factor A wifh fhe 
sfh group in factor B in ferm A.B. The composife symbol (aP) is used to show clearly which 
terms fhe inferacfion has arisen from. This expression can be rearranged as 

(apL = x„-(a,, + p,), 

so fhe inferacfion can be considered as fhe difference befween fhe original freafmenf 
effecfs and fhe addifive model based on fhe assumpfion fhaf factors acf independenfly. 
Alfernafively, fhe inferacfion can be considered as fhe freafmenf effecfs adjusted for all 
terms in fhe model fhaf are marginal to if. A ferm is considered marginal fo all ferms of 
which if is a sub-ferm; for example, ferms A and B are bofh marginal fo A.B. By conven- 
fion fhe overall mean, represenfed symbolically here as [1], is considered as marginal fo all 
ofher ferms. We use fhis imporfanf concepf bofh for calculafing parameter esfimafes and 
in idenfifying ferms fo be used for predicfion. 

The full model can fhen be written wifh a crossed freafmenf sfrucfure as 

Vrsk = tt + «r + Ps + (ttP)rs + ^rsk • 

In fhe sum-fo-zero parameferizafion (infroduced in Secfion 4.5), fhe main effecfs are wrif- 
fen as deviafions abouf fhe sample grand mean, wifh fhe resulfing consfrainfs 2,a, = 0 and 
XjPj, = 0. The inferacfion effecfs are in furn wriffen as deviafions from fhe main effecfs, 
wifh fhe resulfing consfrainfs 2,,(aP),s = 0 for s = 1 ... and 2j,(aP),s = 0 for r = 1 ... fg. 
Application of fhese consfrainfs prevenfs fhe model from becoming over-parameferized. 
This and ofher parameferizafions are discussed furfher in Secfion 8.2.6. 



8.2.2 Estimating the Model Parameters 

As usual, the model parameters are estimated by the method of least squares. Here, we 
consider the simplest case of a full factorial structure with equal replication, which gives 
an orthogonal structure; other cases are dealt with in Chapter 11. In this case, main effects 
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and interactions can be expressed as functions of fhe unsfrucfured freafmenf effecfs, 
and fheir esfimafes can be similarly derived from esfimafes of fhe unsfrucfured freafmenf 
effecfs, - y. 

From Equafion 8.2 and fhe paramefer consfrainfs = 0 and ^^(aP),^ = 0, if follows fhaf 




Hence, we can derive esfimafed main effecfs for facfor A as 



a, = 



1 fB ^ fB 

— “ y) = yr~ ~ y ■ 

Er Er 



We can calculafe fhis quanfify by faking fhe marginal means of fhe unsfrucfured freaf- 
menf effecfs across levels of facfor B. Similarly, we can esfimafe fhe main effecfs for facfor 
B by faking fhe marginal means of fhe unsfrucfured freafmenf effecfs across levels of fac- 
for A, giving 



1 



y*s« y 



The esfimafed inferacfion effecf is equal fo fhe unsfrucfured freafmenf effecf adjusfed 
for bofh main effecfs, as 

(ocP)re = T,s — (tt, + Ps) = (i/re. ~ y) ~ [(yr- ~ y) + (y-s- ~ y)] = yrs- ~ yr- ~ y-s- + y • 



We can easily derive fhese esfimafes from a fwo-way fable of freafmenf means, affer 
adjusfing for marginal ferms. This is demonsfrafed in Example 8.1B. 



EXAMPLE 8.1B: BEETLE MATING 

The crossed model for the beetle mating experiment can be written in symbolic form 
with factors Species (two levels) and MateType (two levels) as 

Response variable: logEggs 

Explanatory component: [1] + Species*MateType 

= [1] + Species + MateType + Species. MateType 

In mathematical form, this model becomes 

logEggs^^f, = |i + Species^ + MateType^ + {Species.MateType)^^ + e^^i , , 

where logEggs^^i^ is the logip-transformed number of eggs for the fcth replicate measure- 
ment (for k = l ... 10) of a female of species r (r = 1 for P. vitellinae, 2 for P. vulgatissima) 
with mating of type s (s = 1 for interspecies, 2 for intraspecies). The main effect for species 
r is denoted Species„ with MateType^ as the main effect for mating of type s, and (Species. 
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MateTypef^ being the interaction between species r and mating type s. The overall mean, 
p, and deviation, are as described above, and the sum-to-zero constraints take the 
form 'Lfpecies^ = 0, ILJ^ateType^ = 0, Species. MateTypef^ = 0 for r = 1, 2, and l,XSpecies. 

MateTypef^ = 0 for s = 1, 2. 

The treatment means on the logm scale for the four combinations of species and mat- 
ing type were presented in Table 6.3. This table is reformatted as a two-way table with 
marginal means in Table 8.2a. As we have equal replication, the estimates take a simple 
form. The estimate of the overall mean is the overall mean of the table, i.e. |1 = 1.8085. 
We can obtain estimates of the unstructured treatment effects by subtracting the overall 
mean from each of the other cells, as in Table 8.2b. The main effect estimates, which con- 
sist of marginal means adjusted for the overall mean, are now present in the margins. 
We then subtract the two marginal means from each of the internal cells, as in Table 
8.2c, to obtain the interaction effects. The full set of estimates is shown in Table 8.2c. 
Notice that the values within each set of effects (main effects or interactions) have the 
same absolute value but differ in sign. This is a direct consequence of the sum-to-zero 
constraints, and always occurs for factors with two levels. 



TABLE 8.2a 

Treatment and Marginal Means for logio(Number of Eggs) in the Beetle Mating Experiment 
(Example 8.1B) 



Mating Type 







Interspecies 


Intraspecies 


Average 


Species 


P. vit. 


1.5129 


1.8036 


1.6582 


of female 


P. vulg. 


1.9089 


2.0085 


1.9587 




Average 


1.7109 


1.9060 


1.8085 


TABLE 8.2b 










Subtract Overall Mean from All Other Cells in Table 8.2a to Get Estimates of Main Effects in 


Margins 














Mating Type 








Interspecies 


Intraspecies 


Average 


Species 
of female 


P. vit. 


-0.2956 


-0.0049 


Species.^ = -0.1503 




P. vulg. 


0.1005 


0.2000 


Species^ = 0.1503 




Average 


MateType.^ = -0.0976 


MateType2 = 0.0976 


ji = 1.8085 



TABLE 8.2c 



Subtract Row and Column Marginal Means from Internal Cells in Table 8.2a to Get Estimated 
Interaction Effects. {Sp.MT)^^ is an Abbreviation of {Species.MateTypef^ 







Mating Type 








Interspecies 


Intraspecies 


Average 


Species 
of female 


P. vit. 


{SpMt)u = -0.0478 


(SpMf)u = 0.0478 


SpecicSj = -0.1503 




P. vulg. 


(Sp.MT)2i = 0.0478 


(Sp.Mt)22 = -0.0478 


Species^ = 0.1503 




Average 


MateType.^ = -0.0976 


MateType^ = 0.0976 


A = 1.8085 
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8.2.3 Assessing the Importance of Individual Model Terms 

In Section 4.3, the one-way ANOVA table was obtained for an unstructured set of treat- 
ments that partitioned the total sum of squares into the treatment and residual sums of 
squares. The partitioning of the treatment effects into main effects and their interaction 
leads to a similar partitioning of the treatment sum of squares into components relating to 
the three model terms, as 



TrtSS = SS(A) -r SS(B) -r SS(A.B) , 

where SS(A) and SS(B) are the sums of squares for the main effects of factors A and B, respec- 
tively, and SS(A.B) is the sum of squares for the interaction. In general, we use the form 
SS(-f-A.B|A-i- B) for the interaction sum of squares, to denote that the interaction has been 
added into the model after both main effects. This emphasises the fact that the value of the 
interaction sum of squares depends on which other terms have been previously fitted in the 
model, and this is discussed further in Section 11.2.2. These sums of squares are calculated as 

<A tB 

SS(A) = nx tB^(y,.. - yf, SS(B) = nx t/,^{y.„ - yf , 

r=l s=l 

fA ih 

SS(A.B) = - y.,. + yf , 

r=l s=l 



or, equivalently, as the sums of squares of the parameter estimates 

tk tB tk tB 

SS(A) = n X SS(B) = M X SS(A.B) = n^^(a|3)re . 

r-1 s=l r-1 s=l 

The total treatment degrees of freedom (equal to t - 1) are partitioned in a similar manner. 
There are df(A) = t^-l df associated with factor A, df(B) = tg - 1 df associated with factor 
B and the interaction df is calculated by subtraction as 

df(A.B) = (f- 1) - (fA- 1) - (fg- 1) = (fvX fe) -t^-h+l = {tA-l)x (fB- 1) ■ 

This leads to the generic form of ANOVA table shown in Table 8.3. 



TABLE 8.3 

Generic Form of ANOVA Table for a Crossed Treatment Structure with Two Factors 
A and B, with and Levels, Respectively 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance Ratio 


A 


fA-1 


SS(A) 


MS(A) 


pA = MS(A)/ResMS 


B 


tg — 1 


SS(B) 


MS(B) 


pB = MS(B)/ResMS 


A.B 




SS(A.B) 


MS(A.B) 


pA.B = MS(A.B)/ResMS 


Residual 


N-t 


ResSS 


ResMS 




Total 


N-1 


TotSS 







Note: No structural component present, all t (=tAX tg) treatment combinations have n replicates 
giving N = nxt observations. 



Extracting Information about Treatments 



159 



As previously, mean squares are calculated by division of the sums of squares by their 
degrees of freedom. There are now three variance ratios, formed by division of the mean 
squares for each of the two main effects and the interaction by the residual mean square 
(ResMS). Note that the ResMS and ResDF take the same values as when the treatments 
were considered as an unstructured set. 

We have approached the factorial structure by building a crossed model for the treat- 
ment effects using the underlying factors. In terms of understanding the biological system, 
it is helpful to identify as simple a model as possible for prediction, subject to its being 
consistent with the observed data. We start with the most complex model, containing all 
of the terms, and try to identify terms that can be ignored for prediction. We consider the 
interaction term first, which holds information on dependencies in the response across the 
factors. If this term is not statistically significant then we can predict from the main effect 
terms only, asserting independent action of the two factors, and we might even be able 
to simplify the model further. If the interaction is statistically significant then prediction 
from a simpler model is not sensible. 

The variance ratio for the interaction, denoted F* ® in Table 8.3, can be used to test the 
null hypothesis that all of the interaction effects equal zero, or Fig: (a(3)„ = 0 for all r = 1 ... 

s = 1 ... tg, against the general alternative hypothesis that the interaction effects are 
not all zero. Under the null hypothesis, the variance ratio F* ® has an F-distribution with 
(t^ - 1) X (tg “ 1) numerator and N -t denominator df. Recall that we often specify these 
two df as a subscript, and we shall sometimes also abbreviate the factor names in the 
superscript for brevity. If F'^ ® exceeds the chosen critical value of this F-distribution, then 
we have statistical evidence for an interaction, which should not be ignored. If the interac- 
tion is not statistically significant then patterns in the response can be adequately repre- 
sented by the main effects alone. Further simplification may be possible, however, and so 
we next examine each of the main effects in turn. 

Because the structure is orthogonal when all treatment combinations are equally repli- 
cated, the order in which these terms are examined is not important, but we choose to work 
our way up the ANOVA table. The variance ratio for factor B, denoted F®, can be used to 
test the null hypothesis that the main effects for this factor are all equal to zero, or Fig: = 0 

for all s = l ... tg, against the general alternative that they are not all zero. Under the null 
hypothesis, the variance ratio F® has an F-distribution with tg - 1 numerator and N -t 
denominator df. If F® exceeds the chosen critical value of this distribution, this gives statisti- 
cal evidence for the presence of main effects for factor B. A similar process is followed to 
test the main effects for factor A, using the mean square and df associated with that factor. 

EXAMPLE 8.1C: BEETLE MATING 

Table 8.4 is the ANOVA table for the crossed treatment structure in the beetle mating 
experiment. As expected, TotSS and ResSS and their degrees of freedom are the same 
as for the unstructured analysis shown in Table 6.2, and the sums of squares for the 
main effects and the interaction in Table 8.4 together add up to the TrtSS in Table 6.2, 
i.e. 1.37515 = 0.90307 + 0.38073 + 0.09135. Similarly, the main effect and interaction df in 
Table 8.4 add up to the TrtDF in Table 6.2, i.e. 3 = 1 + 1 + 1. This verifies numerically that 
our decomposition into main effects and the interaction is a partitioning of the total 
treatment information. 

The variance ratio for the Species. MateType interaction is Fi ®36 = 3.837, with observed 
significance level P = 0.058. Taking a strict approach to hypothesis testing with a 5% sig- 
nificance level, then the null hypothesis would not be rejected, and we should conclude 
that there is no statistical evidence of an interaction. However, taking a more pragmatic 
approach, there is some suggestion that an interaction might be present and we might 
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TABLE 8.4 



ANOVA Table for logiQ(Number of Eggs) from the Beetle Mating Experiment Using a 
Crossed Treatment Structure (Eactors Species and MateType) (Example 8.1C) 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Species 


1 


0.9031 


0.9031 


ps = 37.932 


< 0.001 


MateType 


1 


0.3807 


0.3807 


F'^ = 15.992 


< 0.001 


Species. MateType 


1 


0.0913 


0.0913 


= 3.837 


0.058 


Residual 


36 


0.8571 


0.0238 






Total 


39 


2.2322 









consider the situation further. In this case, the value of the variance ratio for the interac- 
tion is much smaller than those for the main effects, suggesting that even if an interac- 
tion is present, its effect is relatively small. These issues are discussed further at the 
end of Section 8.3. For now, we decide that we shall not greatly misrepresent the data 
by omitting the interaction, and so the model of independent action is appropriate. As 
both main effects are highly statistically significant, we conclude that both the species 
and type of mating affect the expected logiQ-number of eggs, but the expected decrease 
due to interspecies mating is similar for both species. 



8.2.4 Evaluating the Response to Treatments: Predictions from the Fitted Model 

The ANOVA table is used to identify the subset of model terms that best describe the 
pattern of response across treatment groups. This subset can then be used to predict the 
expected response for any treatment combination or the expected difference in response 
between treatment combinations. If there is evidence of an interaction between the fac- 
tors, then predictions must be based on the full model. If the interaction is not statisti- 
cally significant, then it can be ignored for the purposes of prediction, together with any 
non-significant main effect(s). The remaining terms are used as the model for prediction, 
and there are several ways to approach this. One way is to refit the model containing the 
selected terms only, then obtain predictions from the revised model. This approach is 
always necessary for non-orthogonal structures, as discussed in Chapter 11, as the value of 
the predictions will depend on the model terms fitted. However, one unsatisfactory aspect 
of this procedure is that it involves re-estimation of the background variation by the pool- 
ing of true background variation (based on differences between replicates) with that from 
terms dropped from the model (based on differences between treatment combinations). 
In orthogonal structures, such as a factorial with equal replication (or certain forms of 
unequal replication, see Section 11.1) there is an alternative and more appropriate method 
which gives direct estimates of prediction standard errors based on the original estimate 
of background variation. This method uses the result that the multi-way table of observed 
treatment means, and its margins, give direct estimates of certain population treatment 
means, and that standard errors of these means, and their differences, are easy to derive. 
This is the approach that we outline below. 

We extend our previous notation for population treatment means (Section 4.1) so that 
denotes the population mean for the rth level of factor A and the sth level of factor B, 
which implies 



p,, = p -r a, -r p, -r (ap)„ . 
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Then |j,^. is defined as the expected mean for the rth level of factor A, averaged over the 
levels of factor B present, and is defined similarly as the expected mean for the sth level 
of factor B, averaged over the levels of factor A present in the experiment. 

If the interaction is significant, then the full model is used for prediction, with 



In this case, the prediction for each treatment combination is equal to the observed treat- 
ment mean. The variance of this mean is equal to the background variation divided by 
its replication, and the set of such means are mutually independent. As in Section 4.3, we 
estimate the background variation using = ResMS, and so we can estimate the SE of a 
prediction (previously denoted SEM), or a difference between predictions for two treat- 
ment combinations (previously denoted SED), as 



The second expression only holds for treatment combinations ij such that ij rs. If the 
interaction is not statistically significant, but both main effects are significant, then predic- 
tions for each treatment combination are made as 



We then note that prediction for the rth level of factor A (averaged across all levels of factor 
B, and using = 0) is the mean of these predictions across levels of factor B, giving 



These quantities are the observed marginal means for factor A, taken across levels of factor 
B. Again, we have simple expressions for the estimated SEMs and SEDs as 



Conversely, prediction for the sth level of factor B (averaged across all levels of factor A) 
is 



Ars — A + + Ps + (ocp)rs — J/rs. 




A« = A + A, + ps = 1/r.. + y.s. - y ■ 



p„ = p + a, = 1/,.. . 




A.. = A + P. = I/-.- 




with 
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In fact, these results for main effects hold whether or not the interaction is retained in 
the predictive model. The SEDs can be used to construct LSDs and confidence intervals as 
previously shown in Section 4.4. Again, the df of the t-statistic used to calculate the LSDs 
and confidence intervals is equal to the ResDF from the ANOVA table. 

EXAMPLE 8.1D: BEETLE MATING 

In Example 8.1C, we found that the interaction between species and mating type was 
not statistically significant and so we do not use this term for prediction. The model 
predictions take the form 



Prs = P + Species^ + MateType^ , 

and we can summarize our results by presenting the predicted means for each fac- 
tor separately, i.e. averaging over the other factor. We are interested in the difference 
between the two levels in each case and so use SEDs as a measure of uncertainty. We 
form 95% CIs for the differences and back-transform these onto the original scale for 
interpretation. These results are presented in Table 8.5. 

The expected logged number of eggs for females of species P. vitellinae (Species 
level 1) is 0.301 units smaller than for females of species P. vulgatissima (level 2), and 
the expected logged number of eggs is 0.195 units smaller for interspecies mat- 
ing (MateType level 1) than for intraspecies mating (level 2). The estimated SEDs use 

= ResMS = 0.0238 and « x = n x tg = 20. The LSD calculated at a 5% significance level 
requires the 97.5th percentile of a t-distribution on ResDE = 36 df, i.e. = 2.028, hence 

LSD = SED X t 3 g“^®' = 0.0990. The 95% Cl for each difference is equal to its estimate plus 
or minus the LSD, and the back-transformation for prediction |l,s on the logip scale is 10^''® 

(see Table 6.4). 

As discussed in Section 6.4, a difference on any logarithmic scale back-transforms to a 
ratio on the original scale. From the back-transformed CIs, we can conclude that females 
of species P. vulgatissima lay on average 1.59-2.52 times as many eggs as females of spe- 
cies P. vitellinae. Similarly, intraspecies mating produces 1.25-1.97 times as many eggs as 
interspecies mating. These results are consistent with the back-transformed individual 
treatment means presented in Table 6.5, but are more readily interpreted in terms of the 
underlying factors. 

8.2.5 The Advantages of Factorial Structure 

The use of facforial freafmenf sfrucfures leads fo experimenfs wifh clear conclusions, and 
fhey are fo be recommended for fhis facf alone. However, fhey also enable us fo design 



TABLE 8.5 



Marginal Means, logio(Number of Eggs), for Species and Mating Type with Estimated 
Differences, SEDs and Back-Transformed 95% Confidence Intervals (Cl) (Example 8.1D) 





Species 


MateType 


Prediction level 1 


ill. = 1.658 


A.I = 1.711 


Prediction level 2 


Iz. = 1.959 


A.2 = 1.906 


Difference (level 2 - level 1) 


A 2 . - Ai. = 0.301 


A-2 - A-1 = 0.195 


SED 


SE(A2. - Ai.) = 0.0488 


SE(A.2 - A-i) = 0.0488 


95% Cl for difference 


(0.202, 0.399) 


(0.096, 0.294) 


Back-transformed Cl 


(1.59, 2.52) 


(1.25, 1.97) 
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efficient experiments whether interactions are present or not. For example, suppose that 
the beetle mating experiment had been done as two separate experiments: one to com- 
pare inter- versus intraspecies mating for P. vitellinae and the other to make the same 
comparison for P. vulgatissima, each using 10 replicates per treatment as in the original 
experiment. These two separate experiments use the same total number of experimental 
units as the factorial experiment. However, because these new experiments were run at 
different times, there is no way to assess the interaction, i.e. whether the decrease in the 
logged number of eggs laid after interspecies mating differs between the two species, as 
any difference may be due to some change in the background environment rather than 
the change in species. In contrast, the interaction can be tested directly in the factorial 
experiment. In addition, the replication for testing differences between main effects is 
20 for both factors Species and MateType (as in Example 8.1D), rather than 10 for the 
individual experiments, which gives an increase in both precision and power. Finally, the 
main effects in a factorial experiment are tested over a range of conditions (corresponding 
to the other factor levels) and any main effects that emerge must be consistent over these 
conditions. Hence, a factorial experiment arguably provides a more broadly applicable 
estimate of main effects than an experiment that tests a single factor with other condi- 
tions held fixed. 



8.2.6 Understanding Different Parameterizations 

In Section 4.5, we described several different forms of parameterization for a model with 
an unstructured set of treatments. Similar forms of parameterization can be applied to 
structured models, such as the crossed models discussed in this section. 

The sum-to-zero parameterization described in Section 8.2.1 has 1 + tj^ + t^ + {tj^x tg) 
parameters, from the overall mean, two sets of main effects and the interaction, respec- 
tively. As we have only x tg treatment groups, not all of these parameters can be esti- 
mated uniquely and so constraints are imposed. The sum-to-zero parameterization 
imposes 1 + tj^+t^ constraints: Z,a, = 0 (1 constraint), = 0 (1 constraint), E,(aP),s = 0 (tg 
constraints) and ^^(aP),^ = 0 (t^ constraints). This looks like 2 + tj^ + t^ constraints, but in fact 
the latter two sets contain one dependency, so there are only 1 + tj^ + t^ separate constraints 
in total. We therefore obtain a total of t^ x tg unconstrained parameters, equal to the num- 
ber of separate groups, as required. 

As in the one-way unstructured model, we can also use the first-level-zero parameter- 
ization introduced in Section 4.5 for the crossed model with two factors. In this case, we 
write the model in the form 



yrsk M-n "t" (hOre ^rsk / 



with constraints r|i = 0, = 0, (riQ,! = 0 and (rjOis = 0 for r = 1 . . . t^, s = 1 . . . tg. Again there 

is one duplicate constraint within the latter two sets, so the total number of constraints is 
equal to 1 -r t^ + tg as required. We use different symbols here to emphasize the fact that 
interpretation of the effects differs from that in the sum-to-zero parameterization. Here, 
the parameter pjj represents the population mean for a group with the first level of both 
factors. In terms of the sum-to-zero parameterization, we can write this as 



hii = h + oci + Pi + (ap)ii . 
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Then r\^,r = l ... t^, represents the difference between fhe rfh and firsf levels of factor A af 
fhe firsf level of facfor B, i.e. 



ri, = a,,-ai + (ap)rt-(ap)ii. 

Similarly s = 1 ... fg represenfs fhe difference befween fhe sfh and firsf levels of facfor B 
af fhe firsf level of facfor A, i.e. 



C., = Ps - Pi + («P)n - («P)ii • 

The effecfs associafed wifh fhe inferacfion term A.B, 

(ilQ« = (ap)„ - (ap),,i - (ap)i, + (ap)n , 

can be fhoughf of as deviafions relafive to fhe firsf row and column in fhe fwo-way fable of 
unsfrucfured freafmenf effecfs. If we omif fhe inferacfion term, so fhaf fhe model becomes 

yrsk = Pll + hr + Cs + ^rsk , 

fhen inferprefafion of some paramefers changes: now r|, = a, - ttj is fhe expected difference 
befween observafions wifh fhe rfh and firsf levels of facfor A for any given level of facfor B; 
similarly = P^, - Pj is fhe expected difference befween observafions wifh fhe sfh and firsf 
levels of facfor B for any given level of facfor A. This change in inferprefafion, dependenf 
on whefher an inferacfion is presenf in fhe model or nof, is a major disadvanfage of fhis 
parameferizafion. Despife fhis, firsf-level-zero parameferizafion is commonly used fo fif 
linear models fo non-orfhogonal or unbalanced experimenfs. This is described in more 
defail in Secfion 11.2. 

To some exfenf, fhe parameferizafion used is unimporfanf, as fhe sum-fo-zero and firsf- 
level-zero parameferizafions bofh resulf in fhe same predicfions, fiffed values and ANOVA 
fable (alfhough fhe sums of squares are no longer equal fo sums of squared parameter esfi- 
mafes when using firsf-level-zero consfrainfs). However, we mighf be somewhaf confused 
when looking af individual paramefer esfimafes if we do nof undersfand fhe nafure of 
fhe parameferizafion, and fhe defaulf parameferizafion can vary befween sfafisfical pack- 
ages, or even befween differenf commands wifhin fhe same package. This is a good rea- 
son for making inferences on populafion means rafher fhan on individual paramefers, as 
described in Secfion 8.2.4. 

Ofher parameferizafions are also used, fhe mosf common being lasf-level-zero con- 
sfrainfs (see Secfion 4.5), which are closely related fo fhe firsf-level-zero consfrainfs buf 
consfrain paramefers associafed wifh fhe lasf level of each facfor rafher fhan fhe firsf level. 



8.3 Crossed Treatment Structures with Three or More Factors 

Multi-way crossed structures are formed from all combinations of levels of three or more 
factors. The same logic and modelling process followed for the two-way ANOVA also apply 
here, but interpretation of higher-order interactions, i.e. interactions among three or more 
factors, becomes more difficult. In this section, the analysis for a three-way crossed struc- 
ture is demonstrated, followed by a discussion of some issues with multi-way factorial 
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designs. Again, it is assumed that we have a full factorial structure with all experimental 
treatment combinations present with equal replication so that the structure is orthogonal 
(see Chapter 11 for defails for non-orfhogonal sfrucfures). As before, if is necessary fo rela- 
bel fhe observafions, here in ferms of fhree individual factors: factor A wifh levels, factor 
B wifh fg levels and factor C wifh f<- levels so fhe fofal number of freafmenfs ist = t/^xt^x t^. 
The subscripfs r, s and u are used fo indicafe fhe level of factors A, B and C presenf on each 
unif, respecfively, and fhe subscripf k is used fo disfinguish unifs wifh fhe same freafmenf 
combinafion. The model in ferms of an unsfrucfured freafmenf sef can fhen be wriffen as 

y rsuk M- '^rsu ^rsuk * 

As for fhe crossed model for fwo factors, we can now proceed fo decompose fhe freafmenf 
effecfs in ferms of a main effecf for each facfor and fheir inferacfions. We can do fhis by 
wrifing 



X rsu ^ r Ps y 11 "t" (0C[3)rs (OCy)ru "t" (Py)su “t" (0C[3y)rsw / 

where a„ and y„ are main effecfs for fhe rfh group in facfor A, sfh group in facfor B, and 
Mfh group in facfor C, respecfively There are fhree fwo-facfor inferacfions, fhe A.B inferacfion 
(aP),s, fhe A.C inferacfion (ay)„„ and fhe B.C inferacfion (Py)s„, and a fhree-factor A.B.C rnferac- 
fion (aPy)„„. The fhree-f actor inferacfion can be considered as fhe paffern remaining once fhe 
main effecfs and all fwo-facfor inferacfions have been removed. The presence of a fhree-factor 
inferacfion implies a complex infer-dependency befween levels of all fhree factors fhaf can be 
hard fo inferpref. The full fhree-way crossed model can be wriffen in symbolic form as 

Explanatory componenf: [1] -i- A*B*C 

= [1] -r A -r B -r C -r A.B -r A.C -r B.C -r A.B.C 

We again use sum-fo-zero consfrainfs, and parameter esfimafes can again be derived 
from fhe esfimafes of fhe unsfrucfured freafment means. The TrfSS and TrfDF are now 
each parfifioned info seven componenfs as in Table 8.6. 



TABLE 8.6 

Generic Form of ANOVA Table for a Three-Way Crossed Treatment Structure with Factors A, B 
and C, with and t(~ Levels, Respectively 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance Ratio 


A 


h-l 


SS(A) 


MS(A) 


FA = MS(A)/ResMS 


B 


tg — 1 


SS(B) 


MS(B) 


FB = MS(B)/ResMS 


C 


tc-1 


SS(C) 


MS(C) 


FC = MS(C)/ResMS 


A.B 




SS(A.B) 


MS(A.B) 


FAB = MS(A.B)/ResMS 


A.C 


(tA-l)(fc-l) 


SS(A.C) 


MS(A.C) 


FAc^ = MS(A.C)/ResMS 


B.C 


(^B “ 1)(^C “ 1) 


SS(B.C) 


MS(B.C) 


FB-c = MS(B.C)/ResMS 


A.B.C 


(tA-l)(fB-l)(tc-l) 


SS(A.B.C) 


MS(A.B.C) 


FA.Bc = MS(A.B.C)/ResMS 


Residual 


N-t 


ResSS 


ResMS 




Total 


N-1 


TotSS 







Note: No structural component present, all t (= x tg x Iq) treatment combinations have n replicates giving 
N=nxt observations. 
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8.3.1 Assessing the Importance of Individual Model Terms 

As in the two-factor case, we might hope that we can find a simple model for prediction 
by ignoring one or more interaction terms. We still examine the most complex terms in the 
ANOVA table first, working upwards through the table. The procedure runs as follows. 
We first examine the three-way interaction. If it is significant then we cannot ignore any 
terms, and predictions are made from the multi-way table of means for all three treat- 
ment factors (A X B X C). If the three-way interaction is not significant then we proceed 
to examine the two-way interactions. If these are all significant then the model cannot be 
simplified further, and predictions can be made from the three two-way tables of marginal 
means (A x B, A x C and B x C). If one of the two-way interactions is not significant, say 
A X B, then predictions can be made from the other two-way tables of means (A x C and 
B X C). If only one of the two-way interactions is significant, say A x C, then predictions are 
made from that table of means and, if the B main effect is significant, the one-way table of 
means for factor B. If none of the two-way interactions is significant, then the main effects 
are examined, and predictions made from the marginal means for the treatment factors 
with significant main effects are reported. 

This procedure may seem complicated, but it can be formalized with a diagram and the 
principle of marginality introduced in Section 8.2.1. Recall that all sub-terms of a model 
term are considered to be marginal to it; for example, terms A and B are marginal to term 
A.B. The principle of marginality requires that for each term in a model, all sub-terms 
should also be included. This is illustrated for a three-way factorial structure with factors 
A, B and C in Figure 8.3. 

The testing procedure starts at the bottom of the structure, with the three-way interac- 
tion. We go through an iterative process: at each step, we examine only terms that have 
no arrows leading away from them, i.e. that are not marginal to other terms. We test these 
terms, and we erase any non-significant terms and the arrows that lead to them. This 
process is repeated until no further progress can be made, and predictions are made 
using the remaining terms. Figure 8.4 illustrates the case where only the two-way interac- 
tion A.C and the main effect B are significant. At step 1, only the three-way interaction is 
tested. At step 2, the three-way interaction has been found non-significant, so all two-way 



[ 1 ] 




ABC 




A.B A.C B.C 




A.B.C 



FIGURE 8.3 

Marginality relationships for a three-way crossed structure with factors A, B and C. Arrows lead away from 
each term towards any other terms to which it is marginal. 
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FIGURE 8.4 

Eligibility of terms for testing in a three-way crossed structure as interactions are eliminated. Eligible terms at 
each step are highlighted in bold. At each step arrows lead away from each term towards any other remaining 
terms to which it is marginal. 



interactions can be tested, and only A.C is found significant. At step 3, the main effect of B 
can now also be tested and is found significant and the process is finished. Predictions are 
made from all tables of means corresponding to terms without arrows leading away from 
fhem, here the A.C and B tables. 

This approach can be extended to multi-way factorial tables and treatment structures of 
any kind. We reiterafe here that in this orthogonal scenario we do not refit the model, we 
merely select terms to use for prediction. 

EXAMPLE 8.2: LADYBIRD TRANSMISSION OE EUNGUS 

An experiment was done to investigate the transmission of fungus by ladybirds onto 
aphids on two types of host plant (beans or birdsfoot trefoil; Ekesi et al., 2005). The exper- 
imental units were containers, each holding one plant with 20 aphids, and the space 
available was sufficient for 36 containers. A number of sporulating aphid cadavers (5, 

10 or 20) were distributed on plants to provide different loads of infective material, with 
six plants of each host type at each load (36 containers). In three of the containers for 
each host x load combination, a ladybird was allowed to forage for four hours. The treat- 
ment allocations were made completely at random and the numbers of live and infected 
aphids per plant were counted after seven days. Because much variation was expected, 
this procedure was repeated, giving two runs each with a CRD structure, with six rep- 
licates of the 12 experimental treatments and 72 observations in total. The explanatory 
component for this experiment corresponds to a three-way crossed structure with treat- 
ment factors host type (factor Host with two levels), number of infective cadavers (factor 
Cadaver with three levels) and absence or presence of ladybirds (factor Ladybird with 
two levels). The two replicates introduce structure into the set of experimental units, 
and are labelled with factor Run (with two levels). As the actual randomization of plants 
within runs is not available we have arbitrarily labelled plants within each run using fac- 
tor DPIant (with 36 levels). The data are shown in Table 8.7 and held in file ladybird.dat. 

There was some predation by the ladybirds, so there were fewer than 20 live aphids 
(variate Live) in these containers (minimum 12), and the number of infected aphids 
(variate Infected) could not be directly compared across treatments. The percentage of 
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TABLE 8.7 



Number of Infected and Live Aphids Used to Investigate the Transmission of Fungus (Cadaver 
Dose) by Ladybirds (Presence/Absence) on Different Host Plants (Birdsfoot Trefoil or Beans) 
(Example 8.2 and file ladybird.dat) 



Ladybird 

Presence 


Cadaver 

Dose 


Host: Birdsfoot Trefoil 






Host: Bean 




Runl 


Run 2 


Run 1 


Run 2 


Infected 


Live 


Infected 


Live 


Infected 


Live 


Infected 


Live 


+ 


5 


1 


15 


2 


18 


5 


18 


5 


15 


+ 


5 


1 


13 


1 


13 


3 


20 


3 


17 


+ 


5 


2 


16 


1 


15 


7 


17 


2 


18 


+ 


10 


2 


12 


2 


17 


10 


17 


8 


19 


+ 


10 


2 


16 


1 


18 


3 


15 


7 


15 


+ 


10 


3 


15 


3 


16 


2 


14 


6 


16 


+ 


20 


7 


14 


9 


18 


9 


19 


11 


16 


+ 


20 


8 


17 


6 


19 


6 


18 


9 


17 


+ 


20 


7 


16 


7 


15 


12 


19 


11 


14 


- 


5 


1 


20 


0 


20 


1 


20 


1 


20 


- 


5 


1 


20 


0 


20 


6 


20 


2 


20 


- 


5 


2 


20 


1 


20 


7 


20 


2 


20 


- 


10 


2 


20 


0 


20 


3 


20 


7 


20 


- 


10 


1 


20 


2 


20 


4 


20 


5 


20 


- 


10 


2 


20 


2 


20 


5 


20 


5 


20 


- 


20 


2 


20 


3 


20 


4 


20 


9 


20 


- 


20 


3 


20 


2 


20 


8 


20 


8 


20 


- 


20 


3 


20 


2 


20 


5 


20 


5 


20 



Source: Data from Rothamsted Research (J. Pell). 



infected aphids was therefore used as a measure of transmission. Preliminary analysis 
showed some variance heterogeneity, and so a logit transformation was applied to per- 
centages of infection after adjustment for zero counts, i.e. Logitp = logit(Percenf) where 
Percent = 100 x {Infected + l)/(Live + 2). In symbolic form, the full model for this experi- 
ment can be written as 

Response variable; Logitp 

Explanatory component: [1] + Host*Cadaver*Ladybird 

Structural component: Run/DPIant 

This model can be written in mathematical form as 

Logitprsuki = B + RuUk + Host, -t Cadavers + Ladybirdu + (Host.Cadaver)rs 

-t {Host.Ladybird)ru + {Cadaver. Ladybird)su + {Host. Cadaver. Ladybird\su + ersuki , 



where Logitprsuki is the logit-transformed percentage of infection for the /th replicate 
measurement (/ = 1 . . . 3) in the fcth experimental run {k = 1, 2) for the rth host (r = 1 for 
beans, 2 for birdsfoot trefoil) with the sth cadaver dose (s = 1, 2, 3 for 5, 10 or 20 cadav- 
ers) with ladybirds absent {u = 1) or present {u = 2). The structural component generates 
effects for each run (from term Run), denoted Run^ for A: = 1, 2, and for each plant within 
each run (from term Run.DPIant), which are the model deviations (here equivalently 
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labelled by treatments, runs and replicates). The main effect for the rth host is denoted 
Host„ with Cadaver^ as the main effect for the sth cadaver dose and Ladybird^, as the main 
effect for ladybird absence or presence. The definition of the interaction terms and con- 
straints follow as described above. 

The multi-stratum ANOVA table generated by this model is in Table 8.8. There are 
two strata, representing variation between runs (Run stratum) and variation within 
runs (Run.DPIant stratum). Treatments are estimated from comparisons within runs 
and thus appear in the lower (Run.DPIant) stratum. The ResDF differ from those in 
Table 8.6 as they have been adjusted, i.e. reduced by one, to account for the presence of 
runs. A composite set of residual plots (Figure 8.5) gives no evidence of departures from 
the model assumptions. 

Working upwards from the bottom of the ANOVA table, we see that the three-way inter- 
action is not statistically significant (Fi^s 9 ^ = 0.435, P = 0.649) and so can be excluded 
for prediction. Of the three two-way interactions, only the Cadaver.Ladybird interaction 
is significant (p 2‘;59 = 3.774,P = 0.029) so we can also test the Host main effect, which 
is highly significant (Fi% = 59.172,P < 0.001). The model predictions therefore use all 
main effects and the two-way Cadaver.Ladybird interaction, in addition to the Run term 
from the structural component, giving the mathematical form 

ArsiiJ: = A + Rank + Hostr + Cadavers + Ladybird^ + {Cadaver.Ladybird)^^ . 



This gives a prediction for a specific run, but it is usually sensible to average over terms 
in the structural component which, using the sum-to-zero constraints, gives 



Arsu. = A + Hostr -t Cadavers + Ladybird^ 



+ {Cadaver.Ladybird)^^ . 



We can therefore predict patterns of transmission by looking at the two-way 
Cadaver x Ladybird table of means (averaged across hosts) and the marginal table of 
means for Host (averaged across cadaver concentrations and presence or absence of 
ladybirds) shown in Table 8.9. 

TABLE 8.8 

Multi- Stratum ANOVA Table for the Logit-Transformed Percentage of Infected Aphids from the 
Ladybird Transmission Experiment Performed in Two Blocks (Factor Run) Each Using 36 Plants 
(Pactor DPIant) with a Three-Way Crossed Treatment Structure (Eactors Host, Cadaver and 
Ladybird) (Example 8.2) 



Source of Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Run stratum 












Residual 


1 


0.0677 


0.0677 


0.294 


0.589 


Run.DPIant stratum 












Host 


1 


13.5992 


13.5992 


pH = 59.172 


< 0.001 


Cadaver 


2 


17.0274 


8.5137 


F° = 37.044 


< 0.001 


Ladybird 


1 


11.0907 


11.0907 


F'- = 48.257 


< 0.001 


Host. Cadaver 


2 


0.3078 


0.1539 


pH c = 0.670 


0.516 


Host. Ladybird 


1 


0.2279 


0.2279 


pH L = 0.992 


0.323 


Cadaver.Ladybird 


2 


1.7349 


0.8675 


pc L = 3.774 


0.029 


Host. Cadaver.Ladybird 


2 


0.1999 


0.1000 


pH.c.L = 0.435 


0.649 


Residual 


59 


13.5596 


0.2298 






Total 


71 


57.8151 
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Fitted value 



Fitted value 




Standardized residual 



2 " ^4 




- 2-1012 
Normal quantile 



FIGURE 8.5 

Composite set of residual plots for the ladybird transmission of fungus experiment (Example 8.2). 



The mean logit percentage of infected aphids was greater when the host plants were 
beans rather than birdsfoot trefoil, when ladybirds were present and as the concen- 
tration of cadavers increased. The Cadaver.Ladybird interaction is caused by a larger 
increase in transmission (on the logit scale) due to ladybird presence at a concentration 
of 20 cadavers per plant than at smaller concentrations. These patterns are easier to see 
if we plot the predictions, as shown in Figure 8.6. This figure also shows predictions 
calculated from the full set of model terms, i.e. including the terms found to be not 
statistically significant, and it is clear that discrepancies between the two sets are small. 



TABLE 8.9 

Tables of Predicted logit(%lnfection) (with Back-Transform as Percentage) for Cadaver x Ladybird 
Interaction and Main Effect of Factor Host (Example 8.2) 



Ladybird 

Foraging 




Cadaver Concentration 








5 


10 


20 


Plant 




Present 

Absent 


-1.454 (18.9) 
-2.038 (11.5) 


-1.033 (26.3) 
-1.580 (17.1) 
SED = 0.1957 


0.044 (51.1) 
-1.179 (23.5) 


Trefoil 

Bean 


-1.641 (16.2) 
-0.772 (31.6) 
SED = 0.1130 



Note: SEDs apply to logit scale. 
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FIGURE 8.6 

Logit(%Infection) predicted, for (a) birdsfoot trefoil and (b) beans, from the model with all main effects and the 
Cadaver x Ladybird interaction for presence (•) or absence (•) of ladybird foraging, or from the full three-way 
crossed model (°o) (Example 8.2). 



This confirms that the terms used for prediction give a simple but accurate summary 
of patterns of response. Finally, these predictions can be back-transformed to predict % 
infection for each treatment combination, as in Table 8.9. 



8.3.2 Evaluating the Response to Treatments: Predictions from the Fitted Model 

We have provided a recipe for the identification of model terms for use in prediction, but 
common sense is also required. In many practical situations, the size of the main effects 
are large compared with the size of the interactions, which are sometimes regarded as 
modifications to (or departures from) the main effects model. Using this argument, we can 
expect that the sizes of the effects decrease as the order of interactions increases. We might 
therefore find ourselves in a situation where a high-order interaction is statistically sig- 
nificant but its effects are so small compared with the main effects that ignoring them has 
little impact on the biological conclusions. It can therefore be helpful to discuss the rela- 
tive impact of terms when you report results. In an extreme situation, particularly where 
the number of ResDF are very large, one might detect high-order interactions where the 
effects are so small as to have no biological relevance. (In such cases, equivalence testing, 
described in Section 10.5, can be useful to establish whether any biologically meaningful 
differences are present.) These considerations might suggest that it is more appropriate to 
work down the ANOVA table, starting with the main effects and respecting marginality, 
i.e. only testing terms for which all sub-terms are significant. However, in practice we have 
seen cases in which a two- (or higher) way interaction is significant but the corresponding 
main effects are negligible, and working down the table would then result in our drawing 
the wrong conclusions. 

It is also possible that the presence of interactions depends on the scale of analysis: this 
usually becomes apparent when different transformations are applied to tackle problems 
with the assumptions seen in the residual plots (Chapter 6). This behaviour is expected: 
there may be some scales on which the assumption of additivity (no interaction) is valid, 
and a simpler model can be found. This is useful in practice only if the model assumptions 
are also met on the same scale. Validating the assumptions should always take precedence 
over simplifying the model. 
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In Section 8.2.5, the efficiency of a factorial sfrucfure for fwo freafmenf facfors was 
discussed in comparison to fwo separafe experimenfs fesfing each factor in isolafion. Parf 
of fhis efficiency arises from our being able to fesf for inferacfions befween fhe facfors and 
parf arises from fhe addifional (somefimes called hidden) replicafion available when no 
inferacfion is presenf. These advanfages also apply fo higher-order facforial sfrucfures, 
wifh fhree or more facfors. However, fhe number of freafmenf combinafions increases 
mulfiplicafively as fhe number of freafmenf facfors increases, which can resulf in very 
many experimenfal freafmenfs. This is often seen as a disadvanfage of facforial sfrucfures, 
parficularly if inferacfions befween freafmenfs are expecfed fo be eifher small or non- 
exisfenf. Several sfrafegies can be used fo fackle fhis problem. If fhe approach of a facforial 
sfrucfure is desirable, buf fhe full replicated experimenf would be too large fo manage 
due fo consfrainfs of eifher space or fime, fhen if may be possible fo complefe fhe experi- 
menf in several runs, each of which confains fhe full sef of experimenfal freafmenfs. This 
sfrafegy was used in Example 8.2, in which six replicates of each freafmenf combinafion 
were required due fo high variafion buf fhere was only enough growing space fo manage 
fhree replicates af a fime. The differenf fime periods infroduce sfrucfure info fhe experi- 
menfal unifs fhaf musf be faken info accounf in fhe analysis, as was done in Example 
8.2. This approach is feasible only if fhe experimenfal sysfem is sfable over fime, so fhaf 
freafmenf x time interactions are unlikely and the background variation will not change. 
If fhe experimenfal material cannof provide sefs of homogeneous unifs large enough fo 
confain fhe full sef of experimenfal freafmenfs, fhen designs are required in which each 
block confains only a subsef of freafmenfs. Designs wifh efficienf blocking for facforial 
freafmenf sfrucfures are discussed in Secfion 11.3.2. Einally, if many of fhe possible infer- 
acfions, especially higher-order inferacfions, are expecfed fo be absenf fhen if would be 
wasfeful fo replicafe all freafmenf combinafions. The class of fracfional facforial designs 
was developed fo deal wifh fhese cases and is discussed in Secfion 11.3.1. 

In fhis and fhe previous secfion, we have assumed fhaf fhe full facforial sef of freafmenfs 
is presenf and equally replicated. If fhis is fhe case fhen fhe sfrucfure is orfhogonal, so 
fhaf fhe esfimafes for each main effecf are fhe same whefher or nof fhe ofher factor (and 
fhe inferacfion) is included in fhe model (see Secfion 11.1 for more defails). This is fhe 
reason we can use a subsef of ferms for predicfion. If freafmenf combinafions are missing 
or unequally replicafed, fhen fhe sfrucfure may become non-orfhogonal and fhe sfafisfi- 
cal analysis becomes more complex, as described in Chapter 11. If one or more freafmenf 
combinafions have been omiffed, perhaps for sound pracfical reasons buf wifhouf regard 
fo fhe overall sfrucfure, fhen fhe individual facfors are likely fo become non-orfhogonal. 
There are fhen several opfions. The freafmenfs can always be analysed as an unsfrucfured 
sef, wifh confrasfs used fo explore specific comparisons (see Secfion 8.6), buf fhis loses fhe 
advanfages of fhe facforial sfrucfure. If fhe sfrucfure is sfill close fo orfhogonal, fhen a 
facforial analysis will often sfill be useful and fhe analysis for fhis sifuafion is discussed 
furfher in Secfion 11.2. However, fhere are some special cases fhaf deserve furfher affen- 
fion here. Some schemes of unequal replicafion refain many of fhe properfies of fhe full 
facforial sfrucfure. Eor example, if one level of a freafmenf facfor has addifional replica- 
fion such fhaf all freafmenf combinafions involving fhaf level have fhe same replicafion, 
fhen fhe sfrucfure remains orfhogonal (see Secfion 11.1) and fhe analysis can proceed as 
described above. The only change is fhaf esfimafed SEs for freafmenf means or differ- 
ences involving fhaf facfor level will become smaller because of fhe addifional replicafion. 
Similarly, somefimes subsefs of experimenfal freafmenfs can be omiffed whilsf preserving 
some of fhe facforial sfrucfure. This often applies when confrols are presenf. Eor example, 
consider an experimenf fo compare fhe efficacy of differenf pesficides af several doses. The 
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control, consisting of no dose, is the same across all pesticides so does not need to be repli- 
cated for each pesficide. If we regard fhis as a crossed sfrucfure of pesticide x dose plus an 
added control, then we retain the factorial structure in which we are interested, although 
a slightly more complex analysis is required, as described in Section 8.5. 



8.4 Models for Nested Treatment Structures 

In previous sections, we constructed models for freafmenf effecfs in ferms of main effecfs 
and inferacfions for fhe freafmenf facfors using a crossed sfrucfure. This sfrucfure is nof 
always appropriafe, and somefimes a nesfed sfrucfure is preferred. In fhese cases, fhe 
freafmenf facfors usually fall info a nafural hierarchy. For example, a foresfry frial mighf 
assess a sef of clones faken from a small sef of mofhers, and it is natural to think of clones 
as nesfed wifhin mofhers. Similarly, a laboratory frial fo fesf fhe pafhogenicify of a set of 
fungal isolafes mighf use several isolafes from several differenf races, with isolates con- 
sidered to be nested within races. Or a study to examine aphid colonization of differenf 
hosfs might use varieties from wheaf, barley and oafs as fhree differenf crop species, wifh 
varieties nested within species. 

These types of frials require a somewhaf differenf approach fo parfifioning fhe freaf- 
menf intormafion. Usually, fhere is inferesf in whefher fhere is any significant variation 
between the higher-level grouping factor, for example, mofhers, races or species in fhe 
examples above, in addifion fo variafion af fhe lower level wifhin groups. There are several 
ways fo exploif fhis sfrucfure. We again sfarf wifh a sef of unsfrucfured freafmenfs cor- 
responding fo fhe model presenfed in Equafion 8.1. To partifion fhe freafmenf information, 
we relabel the treatments in terms of fhe groups, and fhen number fhe group members. 
Suppose fhere are groups, labelled by index r = 1 ... f^, and fhaf fhe rfh group confains 
f, members, labelled by index s = 1 ... f,. If is nof necessary for all groups fo have fhe same 
number of members. The relabelled freafmenf effecfs, t,s, are fhen parfifioned info fwo 
ferms, and wriffen in mafhematical form as 

= Yr + 5 (y)„ . (8.3) 

In fhis equation, we call y, the parental effect, which is associated with the factor at the top 
level of fhe hierarchy. This is fhe average effecf of all freafmenfs in fhe rfh group, expressed 
as a deviafion abouf fhe overall mean. Nofe fhaf fhe term parenfal here does nof imply any 
genefic relafionship, buf just denotes the top level of a hierarchical nesfed sfrucfure. We 
call 5(y),s the nested effect of fhe sth member of fhe rfh group, and fhis is expressed as a 
deviafion abouf fhe group mean. This inferprefafion uses sum-fo-zero consfrainfs, which 
implies 2,y, = 0 and 2,5 (y),s = 0. The firsf model ferm allows for differences among group 
means and the second allows for differences among group members abouf fheir group 
mean. We use fhe nofafion 5(y)„ fo make a disfincfion befween fhis nesfed ferm and fhe 
inferacfion ferm in fhe crossed models of Secfion 8.2 fo emphasize fhe difference in infer- 
prefation due fo fhe marginal ferms present in each model. The term interaction implies 
that both marginal terms are present in the model, and interaction effects are deviations 
from fhe addifive model fhaf includes bofh main effecfs; fhe ferm nesfed effecf implies fhaf 
only one marginal ferm is presenf, and nesfed effecfs are deviafions from fhat ferm only. 
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We can again illustrate this decomposition in terms of the original unstructured treat- 
ment estimates = yrs- ~ y- Assuming equal replication for each of fhe members, we can 
again esfimafe fhe group parenfal effecfs as marginal means of fhe original freafmenf 
effecfs, as 




wifh the nested effects calculated as the remainder after removing the parental effects, as 

§(Y),s = ^rs ~ Yr = yrs- ~ y r- ■ 

Two factors are required to express this model in symbolic form. The firsf facfor, denofed 
Group, labels fhe groups and fhe second, denofed Member, labels members nesfed wifhin 
groups; hence, fhe explanatory componenf of fhe model can fhen be wriffen as 

Explanatory componenf: [1] -i- Group/Member 

= [1] -I- Group -I- Group.Member 

Unforf unafely, for fhe case of unequal numbers of members wifhin groups, fhis expression 
will nof generafe a direcf franslafion of Equafion 8.3 in mosf sfafisfical soffware, because 
fhe number of effecfs generafed by fhe nesfed term Group.Member is equal to t^ x max(f,), 
i.e. fhe number of groups mulfiplied by fhe maximum number of members in a group. 
As fhere are no dafa on fhe absenf facfor combinafions, if is nof possible to esfimafe fheir 
effecfs and fhey are effecfively ignored, so fhaf esfimafes for presenf combinafions are 
calculafed as above. In sfafisfical soffware, nesfed effecfs for absenf combinafions may be 
represenfed as zero or as a missing value. 

The TrtSS and TrfDE in fhe ANOVA fable are parfifioned according to fhese fwo terms. 
The ANOVA sums of squares can again be calculafed as fhe sum of squares of fhe effecfs 
for each ferm (for presenf combinafions only). We do nof give furfher defails here, as fhe 
ANOVA fable, esfimafes and SEs can be obfained from sfafisfical soffware once fhe model 
has been correcfly specified. This is illusfrafed in fhe example below. 



EXAMPLE 8.3A: SCREENING EOR PATHOGENICITY* 

An experiment was done to screen a set of fungal isolates for pathogenicity on seed- 
lings of oilseed rape. The isolates were collected from two different species of Bmssica, 
labelled as A and B in factor Species, with several different isolates from each spe- 
cies being tested (nine in group A and four in group B), labelled by factor Isolate 
(with nine levels). The experiment was run in three replicates across time (factor Rep), 
with a tray of 22 (replicate 2) or 23 seedlings (replicates 1 and 3) being tested against 
each isolate in each run (factor Tray, with 13 levels). The number of seedlings tested 
was stored in variate Seedlings. The number of resistant seedlings, i.e. those show- 
ing no signs of infection (variate Resistant), was recorded five days after the isolates 
were applied. The percentage of resistant seedlings is the response to be analysed. 
The number responding in each tray is shown in Table 8.10 and the full data set is 
given in file brassica.dat. A preliminary analysis of these percentages showed het- 
erogeneity of variance and so the percentage response, adjusted for zero counts as 
P = 100 X {Resistant + l)/{Seedlings + 2), was logit-transformed to Logitp = \o^JP/ 
(100 - P)), which improved the residual plots (not shown). 
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TABLE 8.10 



Number of Plants Showing Resistance to Isolates in the Pathogenicity 
Screening Experiment (Example 8.3A and File brassica.dat) 



Tray 


Replicate 1 


Replicate 2 


Replicate 3 


Isolate 


Resistant 


Isolate 


Resistant 


Isolate 


Resistant 


1 


B3 


3 


A3 


2 


A3 


3 


2 


A6 


14 


A5 


2 


A7 


5 


3 


A4 


5 


A8 


8 


A2 


3 


4 


B1 


2 


A9 


1 


A5 


2 


5 


A7 


6 


A4 


1 


B1 


1 


6 


B2 


2 


A1 


3 


A8 


16 


7 


A1 


3 


B2 


2 


A6 


15 


8 


A9 


1 


B4 


0 


A1 


4 


9 


A5 


2 


A7 


4 


B3 


4 


10 


A2 


3 


B1 


1 


A9 


0 


11 


A8 


15 


A2 


1 


B4 


4 


12 


A3 


4 


B3 


2 


A4 


4 


13 


B4 


2 


A6 


9 


B2 


1 



Note: Isolates are here labelled using combinations of the levels of factors Species (A or 
B) and Isolate (1-9 for species A, for species B). 



A model in symbolic form for these responses could be written as 

Response variable; Logitp 

Explanatory component: [1] + Species/lsolate 

Structural component: Rep/Tray 

The Species. Isolate term generates 2 x 9 = 18 effects, but those corresponding to spe- 
cies B with isolate numbers 5-9 are absent and so ignored. This model can be written in 
mathematical form as 

Logitprsk = h + Repk + SpecieSr + lsolate{Species\s + firsi / 



where Logitp^^f. is fhe logit-transformed percentage of resisfant seedlings in the fcth rep- 
licate (fc = 1 ... 3) for the rth species (r = 1, 2 for species A and B) with the sth isolate 
(for s = 1 ... tj. where fi = 9 and f 2 = 4). The sfrucfural component generates the replicate 
effects, denoted Rep^ for fc = l ... 3 and the deviations (equivalent to the Rep. Tray 
effects). The parental effect of fhe rth species is denoted Species „ with Isolate(Species\^ 
being the nested effect of fhe sth isolate within the rth species. As usual, the overall mean 
is denoted by p. The sum-to-zero constraints take the form It^Repy. = 0, Zf pedes, = 0 and 
ZJsolate{Species),,, = 0. 

Table 8.11 shows the estimated parental (Species) and nested (Species. Isolate) effects 
derived from the unstructured set of treatment effects as described above. The multi- 
stratum ANOVA for the logit-transformed percentages is shown in Table 8.12. The 
Species sum of squares is equal to the sum of the squared parental effects (across all 
units), and the Species. Isolate sum of squares is equal to the sum of fhe squares of the 
estimated nested effects (across all units). The nested sum of squares here represents the 
accumulated within-species variation. 

There is strong evidence of an overall difference in resisfance to isolates from spe- 
cies A and B (F ®24 = 29.841, P < 0.001) and also of variation in resistance to the isolates 
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TABLE 8.11 



Calculation of Species Parental Effects as the Mean of the Unstructured Treatment 
Effects for Each Group, and of Species. Isolate Nesfed Effects as the Difference 
between Unstructured Treatment Effects and Parental Effects (Example 8.3A) 



Species 


Isolate 


Unstructured 
Treatment Effect 


Species 
Parental Effect 


Species.lsolate 
Nested Effect 


A 


1 


0.011 


0.221 


-0.210 


A 


2 


-0.342 


0.221 


-0.563 


A 


3 


-0.101 


0.221 


-0.322 


A 


4 


-0.083 


0.221 


-0.304 


A 


5 


-0.415 


0.221 


-0.636 


A 


6 


1.777 


0.221 


1.555 


A 


7 


0.418 


0.221 


0.197 


A 


8 


1.835 


0.221 


1.613 


A 


9 


-1.110 


0.221 


-1.331 


B 


1 


-0.715 


-0.498 


-0.218 


B 


2 


-0.565 


-0.498 


-0.068 


B 


3 


-0.101 


-0.498 


0.397 


B 


4 


-0.609 


-0.498 


-0.112 



within species (Ffij 24 = 15.219, P < 0.001). The treatment mean for isolafes from species 
A was -1.341 on the logit scale (back-transformed to 20.7%), and the mean for isolates 
from species B was -2.060 (back-transformed to 11.3%), with SED = 0.1315, indicating 
that fewer plants were resistant to isolates arising from species B. 

One way to avoid the generation of effects for freafmenf groups fhaf are absenf is to use 
a factor (called AllMembers, say) to label the full sef of members across all groups (e.g. like 
fhe combined levels of factors Isolate and Species given in Table 8.10). The explanatory 
componenf can fhen be specified as 

Explanatory componenf: [1] -i- Group -i- AllMembers 

The firsf ferm idenfifies fhe parenfal effecfs, as before, and fhe second ferm idenfifies all 
of fhe nesfed combinafions presenf. This specificafion gives fhe same predicfions and 

TABLE 8.12 

Multi-Stratum ANOVA Table for the Logit-Transformed Percentage of Resistant 
Seedlings from the Pathogenicity Screening Experiment with Three Blocks (Eactor 
Rep) of 13 Trays (Eactor Tray), Two Species (Factor Species) and Several Isolates 
(Factor Isolate) per Species (Example 8.3A) 



Source of Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Rep stratum 












Residual 


2 


1.8664 


0.9332 


6.492 


0.006 


Rep.Tray stratum 












Species 


1 


4.2896 


4.2896 


ps = 29.841 


< 0.001 


Species.lsolate 


11 


24.0655 


2.1878 


ps' = 15.219 


< 0.001 


Residual 


24 


3.4500 


0.1437 






Total 


38 


33.6715 
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ANOVA table, and it avoids the generation of absent combinations, but is not completely 
satisfactory because the nested structure is no longer apparent in the form of fhe model. 
For fhis reason, we prefer fhe nesfed specificafion. 

Using eifher specificafion, we encounfer a slighf complicafion if any group has only one 
member, as fhen fhe parenfal and nesfed effecfs for fhaf individual refer fo exacfly fhe same 
subsef of observafions and are aliased, i.e. if is nof possible fo separafe fhe fwo effecfs. In 
fhis case, fhe esfimafes can sfill be calculafed as above and fhe firsf of fhe fwo ferms fiffed, 
i.e. fhe parenfal or Group effecf, esfimafes fhe combined group and member effecf. There 
is fhen no addifional informafion leff fo confribufe fo fhe nesfed Group. Member effecf, 
which is esfimafed as zero. 

The presence of a sfafisfically significanf variance rafio for a nesfed ferm (as in Example 
8.3A) is evidence fhaf fhe nesfed effecfs are nof all equal fo zero (fhe null h 5 q)ofhesis). In fhis 
confexf, if is possible fhaf fhe variafion befween members is presenf wifhin some groups buf 
nof ofhers, and if may be relevanf fo idenfify fhese groups. We can achieve fhis by spliffing 
fhe Group.Member sum of squares info separafe ferms corresponding fo fhe differenf groups. 
We can do fhis by defining a new sef of factors, one for each group, here called Set1, Set2 and 
so on. The new factor for fhe rfh group has levels 1 fo f, corresponding fo fhe members of fhaf 
group, and adds an exfra level, for example, f, + 1, for members of ofher groups. For example, 
wifh only fwo groups, fhe explanatory componenf of fhe model can fhen be wriffen as 

Explanatory componenf: [1] + Group/(Set1 + Set2) 

= [1] + Group + Group.Setl + Group.Set2 

This specificafion infroduces absenf combinafions info fhe model, for example, fhe factors 
are consfrucfed so fhaf fhere are no members of fhe second group wifh level 1 in factor Set1. 
We can ignore fhese combinafions in calculafing esfimafes alfhough they will be generated 
(with value zero or missing) by some statistical software. And again we can reduce the 
number of missing combinafions by wrifing fhe explanatory componenf as 

Explanatory componenf: [1] + Group + Set1 + Set2 

which gives an equivalenf model buf no longer emphasizes fhe nesfed sfrucfure. Bofh spec- 
ificaf ions have aliasing presenf befween fhe individual group effecfs and fhe exfra levels for 
each sef. As fhe Group ferm is fiffed firsf, fhere is no informafion leff on fhe aliased levels 
in the Group. Set (or Set) terms, which are estimated as zero. The parental effects and each 
set of nesfed effecfs are esfimafed as ouflined previously. This is illusfrafed in Example 8.3B. 



EXAMPLE 8.3B: SCREENING EOR PATHOGENICITY* 

Table 8.13 shows the definition of factors TypeA, which labels individual isolates 1-9 
within species A (with level 10 for isolates from species B), and TypeB, which labels 
isolates 1-4 within species B (with level 5 for isolates from species A). These factors are 
also listed in file brassica.dat. 

A within-group nested model for resistance scores could be written in symbolic form as 
Response variable: Logitp 

Explanatory component: [1] + Species/(TypeA + TypeB) 

Structural component: Rep/Tray 

The groups from the combination of factors Species and TypeA are the individual iso- 
lates within species A plus the whole set of isolates from species B. The Species.TypeA 
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TABLE 8.13 



Calculation of Nested Effects for Each Type of Isolate within Each Species (Example 8.3B) 



Species 


TypeA 


TypeB 


Treatment 

Effects 


Species 

Effects 


Species.TypeA 

Nested Effects 


Species.TypeB 

Nested Effects 


A 


1 


5 


0.011 


0.221 


-0.210 


0 


A 


2 


5 


-0.342 


0.221 


-0.563 


0 


A 


3 


5 


-0.101 


0.221 


-0.322 


0 


A 


4 


5 


-0.083 


0.221 


-0.304 


0 


A 


5 


5 


-0.415 


0.221 


-0.636 


0 


A 


6 


5 


1.777 


0.221 


1.555 


0 


A 


7 


5 


0.418 


0.221 


0.197 


0 


A 


8 


5 


1.835 


0.221 


1.613 


0 


A 


9 


5 


-1.110 


0.221 


-1.331 


0 


B 


10 


1 


-0.715 


-0.498 


0 


-0.218 


B 


10 


2 


-0.565 


-0.498 


0 


-0.068 


B 


10 


3 


-0.101 


-0.498 


0 


0.397 


B 


10 


4 


-0.609 


-0.498 


0 


-0.112 



effects for species A are therefore equal to the previous nested effects for this group (Table 
8.10), and those for species B are zero (Table 8.13). The Species.TypeA sum of squares is 
then calculated from these estimates, and hence has zero contribution from species B, 
and so quantifies variation about the mean within species A only. The Species.TypeA 
mean square can be used to test the null hypothesis that all of the nested effects within 
species A are equal to zero. A similar argument follows for the Species.TypeB term. 
The ANOVA for this model based on the logit-transformed percentages is in Table 8.14. 
Again, the sums of squares correspond to sums of squared esfimated effects (from Table 
8.13) taken over all units. 

The sum of squares for the Species main effect factor has not changed, as expected. 
The sum of squares and df for the term Species. Isolate from the previous analysis (Table 
8.12) have both been partitioned into components for Species.TypeA and Species.TypeB. 
The variance ratio for Species.TypeA shows strong evidence of variation between iso- 
lates within species A (Fg^Ji^ = 20.349, P < 0.001) but that for Species.TypeB gives no 



TABLE 8.14 

Multi-Sfratum ANOVA Table for the Logit-Transformed Percentage of Resisfant 
Seedlings from the Pathogenicity Screening Experiment Using a Within-Group 
Nested Structure (Factors TypeA and TypeB for the Two Species, Respectively) 
(Example 8.3B) 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Rep stratum 












Residual 


2 


1.8664 


0.9332 


6.492 


0.006 


Rep.Tray stratum 












Species 


1 


4.2896 


4.2896 


ps = 29.841 


< 0.001 


Species.TypeA 


8 


23.4012 


2.9251 


psTA = 20.349 


< 0.001 


Species.TypeB 


3 


0.6643 


0.2214 


pS.TB = 1 541 


0.230 


Residual 


24 


3.4500 


0.1437 






Total 


38 


33.6715 
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evidence of variation between isolates within species B (F 3 ®;™ = 1.541, P = 0.230). We can 
therefore conclude fhat there was some variation in resistance to isolates from species A, 
but no significant variation in resistance to isolates from species B. 

We have shown above how to identify and express a nested structure, including the 
attribution of variation between members to the individual parental groups. This prin- 
ciple can be applied to more complex explanatory structures. For example, consider an 
experiment which uses a treatment factor crossed with a nested structure, for example, 
Treatment*(Group/Member). In the context of Example 8.3, this structure would arise if 
each isolate had been tested with two treatments. 



8.5 Adding Controls or Standards to a Set of Treatments 

Many experiments include one or more control or standard treatments, and these can 
play several different roles. The concepts of negative and positive controls were briefly 
introduced in Section 3.1, and both are often intended as validation of the experimental 
process. A negative control is usually a null treatment that is included as a measure of 
baseline response, often used to demonstrate that other treatments have had a real effect. 
For example, consider a trial set up in glasshouse compartments to evaluate the effect of 
some new biocontrol agents on a glasshouse pest. In this case the negative control is a null 
treatment. If infestation in untreated compartments is small, then the experiment may be 
regarded as unsuccessful, as there is little scope to show any effect of the new agents. If, 
on the other hand, infestation is large in untreated compartments, then any effect of the 
new agents is more likely to be observed. A positive control is usually a treatment with a 
known effect that is included as a baseline for a good response. In our example, this might 
be an effective chemical control strategy. Finally, standard treatments may be defined for 
certain types of experiment and included as a means of comparing the response across 
several experiments and of providing a common reference point across experiments. This 
practice is common in variety trials, where some varieties are included in all trials across 
several years, with this standard set slowly evolving to reflect current elite varieties. It is 
also common in many laboratory procedures, where the standards are samples that are 
re-used either within or across experiments as quality controls, and might not have been 
part of the original experiment. The advantage of this approach is that behaviour of the 
standard is well-known, so any deviation from the expected response on these samples 
can give an immediate indication of problems in the experimental procedure. In this sec- 
tion, we use the term 'control' to also refer to standard treatments. 

The correct approach to analysis when controls are present depends on both the aims 
of the experiment and the purpose of the control. If the main purpose of the experiment 
is the direct comparison of individual treatments with the controls, for example, when 
screening a set of chemicals or varieties as to whether they are comparable to one or more 
positive controls, or better than a negative control, then the controls can be regarded as 
an integral part of an unstructured set of treatments, and comparisons can be made as 
described in Section 8.8.4. If the main purpose of the experiment is comparisons between 
non-control treatments, then the approaches described in the remainder of this section 
might be helpful. This often requires the definition of a complex explanatory model con- 
taining both crossed and nested structures. 
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EXAMPLE 8.4A: POTATO YIELDS* 

This experiment was introduced in Example 3.5 and the data were analysed according 
to a one-way treatment structure in Example 7.1. It consisted of a RCBD with four blocks 
to compare the yields of potatoes treated with four different fungicide sprays (El, F2, F3, 

F4) with an unsprayed treatment (Control, negative control). The layout was shown in 
Table 7.1 (data in file potato.dat). The analysis in Chapter 7 showed differences between 
the treatments, with the control giving smaller yields than the four fungicide sprays. 

It would be useful to refine this analysis specifically to evaluate whether there are any 
differences in yield between the fungicide sprays. 

We recommend that controls always be included in the analysis of an experiment, except 
in the special case where the controls are uninformative. This might be the case if the con- 
trol is not to be compared with any other treatment and the background variation within 
the control is quantitatively different from fhaf of other treatments. For example, consider 
a glasshouse trial designed to test the resistance of several variefies to a fungal disease, 
where inoculum has been sprayed onto the leaves to provide a consistent infection. Two 
types of negafive control have been included in the trial: a susceptible variety sprayed with 
inoculum to show that conditions are suitable for disease progression, and fhe same vari- 
ety sprayed with clean water to show that there has been no additional infection or cross- 
infection during the trial. If the experiment is successful then all of fhe planfs sprayed 
with water should show no sign of disease and have a consisfenf zero response. Because 
fhere is no variafion wifhin fhis group, including fhese planfs in fhe sfatisfical analysis 
will decrease the ResMS so that it underestimates the true extent of background variafion. 
If is fherefore legifimafe to exclude these plants from fhe analysis. However, plants of the 
susceptible variety sprayed with inoculum should be retained in the analysis, because 
they provide real quantitative information on the biological system. An assessment as to 
whether controls are informative or not must be done on a case-by-case basis, and requires 
real understanding of bofh the experimental system and the statistical analysis. 

Having decided to retain the controls within the statistical analysis, we need to decide 
which comparisons are of mosf inferesf. If fhe confrols and freatmenfs are analysed as 
a single unsfrucfured set, i.e. labelled by a single factor, then the one-way ANOVA will 
provide only an overall test of variafion wifhin fhe full sef. Treafmenf differences can then 
be extracted from fhe pairwise comparisons of predicfed means, buf fhere are dangers in 
fhis approach thaf are described in Secfion 8.8. Where the controls are expected to be sub- 
stantially different from the treatments, or where comparisons within the set of freafmenfs 
are fhe main purpose of the experiment, it can be helpful fo parfifion fhe joinf variafion 
within the full sef of treafmenfs and confrols info fwo components: one accounting for 
variafion befween the controls and the average treatment effect, and the other accounting 
for variafion wifhin fhe sef of freafments. In fhe case of a single control, we start with the 
model in terms of fhe full sef of control and treatment effects as presented in Equation 8.1, 
except that we allocate the first label, ; = 1, to the control and allocate the remaining labels 
to the t-1 non-control treatments. The treatment effects are then partitioned with a nested 
structure as 



1j = Jr + 5 ( y ),; , 

where index r takes value 1 when; = 1 (control) and takes value 2 otherwise (treated). This 
requires the definition of a new factor, here denofed Type, fhaf labels fhe confrol and freaf- 
menf sefs. The explanatory model for fhis sfrucfure can fhen be writfen as 
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Explanatory component: [1] + Type/Treatment 

= [1] + Type + Type.Treatment 

where the factor Treatment denotes the full unsfrucfured sef of freafmenf groups. The 
process of esfimafing effecfs and sums of squares is exacfly fhe same as for fhe nesfed 
sfrucfures in Secfion 8.5, and fhe same issues of missing combinafions and aliasing arise. 

EXAMPLE 8.4B: POTATO YIELDS* 

The nested control structure can be represented by two factors (Type and Fungicide) as 
defined in Table 8.15. These factors are also given in file potato.dat. 

The model can then be written in symbolic form as 

Response variable: Yield 

Explanatory component: [ 1 ] + Type/Fungicide 
Structural component: Block/Piot 

This model can equivalently be written in mathematical form as 

Yieldirs = |T + Blockj + Type^ +Fungicide(Type)rs + , 

where Yieldj^^ is the yield in the ith block (i = 1 ... 4) for treatment of type r (r = l for 
control, r = 2 for fungicide treatments) with the sth fungicide (s = 1 ... t, with 1^ = 1 and 
t 2 = 4). Fungicide Control is of type 1 (Control), and fungicides El, F2 . . . F4 are of type 2 
(Treated) as shown in Table 8.15, where we have omitted parameters corresponding to 
missing combinations. The structural component generates the block effects, denoted 
Blocks for i = l ... 4, and the deviations e„s (equivalent to term Block. Plot). The parental 
effect of the rth type (control or treated) is denoted Type„ with Fungicide(Type)^^ being the 
nested effect of the sth fungicide within the rth type. 

The Type effects are estimated as the means of the control and treated groups. The 
Type. Fungicide groups with data present are the control treatment plus the individual 
fungicide treatments. The nested effect for the control is aliased with the control group 
parental effect, so the control nested effect is equal to zero. The nested effects for the 
fungicide treatments are differences from their group mean. The Type.Treatment sum 
of squares is then calculated from these estimates, and hence has zero contribution from 
the control and so quantifies variation about the mean within the fungicide treatments 
only, as required. 

The resulting ANOVA table is Table 8.16. As expected, the variance ratio for factor 
Type (Fi42 = 35.972, P < 0.001) gives strong evidence of a difference between the con- 
trol and average of the fungicide treatments. The variance ratio for the nested term 
Type. Fungicide (Fi^J) = 0.778, P = 0.529) gives no evidence of any differences among 
the four fungicide treatments. This gives a quantitative confirmation of the tentative 
conclusions of Example 7.1D. 



TABLE 8.15 

Calculation of Type Parental Effects, and Nested Type. Fungicide Effects (Example 8.4B) 



Fungicide 


Type 


Treatment 

Effects 


Type 

Effects 


Type.Fungicide 

Effects 


Control 


Control 


-158.3 


-158.3 


0 


FI 


Treated 


4.7 


39.6 


-34.9 


F2 


Treated 


49.7 


39.6 


10.1 


F3 


Treated 


66.2 


39.6 


26.6 


F4 


Treated 


37.7 


39.6 


-1.9 
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TABLE 8.16 

Multi- Stratum ANOVA Table for RCBD Potato Yield Trial with Treatment 
Effects (Factor Fungicide) Partitioned into 'Control vs Treated' (Factor Type) 
Plus Nested Variation among Fungicide Treatments (Type. Fungicide) 
(Example 8.4B) 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Block stratum 












Residual 


3 


14,987.20 


4995.73 


1.434 


0.283 


Block.Plot stratum 












Type 


1 


125,294.45 


125,294.45 


pT = 35.972 


< 0.001 


Type.Fungicide 


3 


8124.75 


2708.25 


pTF = 0.778 


0.529 


Residual 


12 


41,796.80 


3483.07 






Total 


19 


190,203.20 









This approach can be extended for more complex treatment structures with one or 
more controls, such as a factorial structure with added control. This type of structure may 
require a mixture of nested effects (to partition out the control) and crossed effects (to 
model the factorial structure) to extract information efficiently. If more than one control is 
present, then the structure can be extended in several different ways. If comparisons with 
these controls are unimportant, then it is sufficient to add one extra level to the Type factor 
for each type of control. The Type factor then evaluates differences among the individual 
controls and the average of the other treatments, and the Type.Treatment interaction evalu- 
ates variation among the non-control treatments. 



8.6 Investigating Specific Treatment Comparisons 

In previous sections, we have defined new factors to enable partitioning of a sef of struc- 
fured freatmenf effects into meaningful comparisons, often with the aim of finding the 
simplest possible description of pafterns wifhin the set. Contrasts provide an alternative 
way of parfifioning a sef of treatmenf effecfs. A contrast translates a specific hypofhesis 
abouf freafment effecfs into mathematical form. There are fwo approaches to dealing with 
contrasts. The first approach involves building the contrast into the ANOVA and the sec- 
ond involves evaluating contrasts from fables of predicted means. In fhis section we use 
the former approach, and the latter is discussed in Section 8.8. 

For example, consider an experiment set up as a RCBD with three blocks, investigating 
the resistance of six wheaf variefies fo virus fransmission by aphids, measured in terms of 
virus concenfrafion in fhe planf af fhe end of the experiment. A model for fhese data can 
be written in mathematical form as 

y,j = \i + h + X; + eij , (8.4) 

where b; is fhe effecf of fhe ifh block, i = 1, 2, 3, is fhe effecf of fhe ;fh variefy, j = 1 ... 6, 
and all ofher ferms are as defined previously. Throughouf fhis section, we assume that 
we are using sum-to-zero constraints, so that Xy represents the deviation from fhe overall 
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mean due to the effect of fhe jth variefy. If fhe firsf fwo variefies are relafed fhrough a 
known resisfanf ancesfor, if mighf be of parficular inferesf fo evaluafe whefher fhere 
is any difference in resisfance befween fhem. This quesfion can be expressed as Ts fhe 
freafmenf effecf for variefy 1 equal fo fhaf for variefy 2?' so, in mafhemafical ferms, we 
wanf fo fesf fhe proposifion Hq: Xi = X 2 . In pracfice, we rewrife fhis in a form such fhaf, 
if fhe null h 5 q)ofhesis is frue, fhen fhe value is equal fo zero, which means reformulaf- 
ing fhe proposifion as Hq: Xj - X 2 = 0; fhis is now in fhe form of a linear confrasf. If we 
build fhis confrasf info our analysis, we can form a fesf for fhis hypofhesis as parf of our 
ANOVA fable. 

In general, and working in ferms of a sef of freafmenf effecfs Xj ... Xf, a linear contrast, 
denoted \]/, is defined as a linear function of fhe freafmenf effecfs, i.e. of fhe form 

t 

V|/ = /iXi + I2T2 + ... + hit = '^Ijlj , 

y=i 

such thaf fhe sum of fhe confrasf coefficienfs, fhe sef /y for j =1 ... f, is equal fo zero, i.e. 
l,j Ij = 0. In our example above, \|/ = Xj - X 2 wifh f = 1, ^2 = -1 and l 3 = h= ... = h = 0. 



EXAMPLE 8.4C: POTATO YIELDS* 

In the potato yield trial described in Example 8.4A, four fungicide treatments were 
tested with a negative control (no fungicide treatment). However, fungicides El and 
F4 use one mode of action (mode A) and fungicides F2 and F3 use another (mode B), 
and it is of interest to evaluate whether there is any overall difference between the 
two modes of action. The linear model for this RCBD trial is equivalent to Equation 
8.4 with four replicates and five treatment effects Xj . . . X 5 referring to the control and 
fungicides FI . . . F4, respectively. Equality of the two modes of action can be expressed 
in words as Ts the average effect of mode A fungicides equal to the average effect of 
mode B fungicides?'. The average effect of mode A fungicides is the average of the 
effects associated with FI and F4, or %(X 2 + Xj). Similarly, the average effect of mode B 
fungicides (F2 and F3) is equal to V 2 (X 3 + X 4 ). Equality between the two quantities can 
then be written as 



1 

2 



(X2 + X5) - 



1 

2 



(X3 + X4) . 



We can rearrange this expression into a contrast by subtracting %(X 3 + X 4 ) from both 
sides of the equation, to obtain 






1 

2 



(X2 + X5) 



1 

2 



(X3 + X4) = (0 X Xi) + 



f 1 ^ 


( 1 1 


( 1 


iff ^ 


\^Xl2\ 


- - X X3 - 


- X X4 




U ) 


U j 


u 


1 U J 



In this case, the contrast coefficients are: h = 0/ ^2 = = 0-5, 13 = h = -0.5. 

Because the true values of fhe freafmenf effecfs are unknown, so too is fhe frue value of 
fhe confrasf. The leasf-squares esfimafe is obfained by subsfifufion of fhe esfimafed freaf- 
menf effecfs in place of fhe unknown frue values, so 



\j/ = /iXi + I 2 I 2 + ... + hit = '^Ijlj ■ 

y=i 
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For a RCBD or CRD with equal replication of all treatment groups, the variance of fhe 
confrasf is equal fo fhe sum of fhe squared coefficienfs mulfiplied by fhe background vari- 
ance and divided by fhe replicafion, which is wriffen as 

2 * 

Var(v) = —Tlj- 

n ^ 

;=i 



As usual, we esfimafe fhe unknown background variafion, a^, using s^, fhe residual mean 
square from the ANOVA table. The estimated contrast standard error, SEfvj/), is calculated 
as the square root of ifs esfimafed variance. Under fhe null hypofhesis fhaf fhe frue value 
of fhe confrasf is equal fo zero, i.e. Hgi \|/ = 0, fhe rafio of fhe confrasf fo ifs esfimafed sfan- 
dard error, i.e. 



SE(ifr) ' 



has a f-disfribufion wifh df equal fo fhe ResDE from fhe ANOVA fable. Eor a fwo-sided fesf, 
if fhe absolufe value of fhe rafio exceeds fhe 100(1 - as/2)fh percenfile of fhis f-disfribufion, 
fhen fhere is sfafisfical evidence (af significance level aj fhaf fhe frue value of fhe confrasf 
is nof equal fo zero. The associafed 100(1 - aJTo confidence inferval can be formed as 



V ± (tResDF X SE(vJf)) . 



We can consfrucf an equivalenf fesf by parfifioning fhe TrfSS in fhe ANOVA fable info a 
componenf corresponding fo fhe confrasf and a remainder. The confrasf sum of squares 
can be wriffen as 



( ' ^ 


2 / 


( ' "i 




/ 




V M j 


/ 


1 M J 



/ 

[SE{^) 






As fhe confrasf sum of squares has 1 df, if is equal fo fhe confrasf mean square. Under 
fhe null hypofhesis, fhe variance rafio of fhe confrasf mean square fo fhe ResMS has an 
E-disfribufion wifh numerator df equal fo 1 and denominator df equal fo fhe ResDE. The 
porfion of TrfSS leff over is called fhe remainder sum of squares. Under fhe null hypofh- 
esis fhaf fhe confrasf has accounfed for all of fhe freafmenf variafion, fhe remainder mean 
square has an E-disfribufion on f - 2 and ResDE df. If fhere is no evidence of variafion in 
fhe remainder, fhen fhe confrasf alone can be used fo describe freafmenf differences. 

Bofh fhe rafio of fhe confrasf fo ifs SE and fhe confrasf sum of squares are invarianf fo 
re-scaling, for example, if fhe coefficienfs for a confrasf are all mulfiplied by 2, fhe rafio and 
confrasf sum of squares are unchanged. To simplify compufafion, some soffware packages 
fherefore aufomafically sfandardize confrasfs by re-scaling so fhaf fhe sum of fhe squared 
confrasf coefficienfs is equal fo 1, i.e. l.j ij = 1- 
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EXAMPLE 8.40: POTATO YIELDS* 

The estimated treatment effects for this trial were shown in the third column of Table 
8.15. To compare the fungicide modes of action we calculate the contrast using b = 0, 
and 13 = 14 = -0.5. This contrast has l,jlj = 1 and is estimated as 

\i/ = (0 X -158.3) + (0.5 X 4.7) + (-0.5 x 49.7) + (-0.5 x 66.2) + (0.5 x 37.7) 

= 0 + 2.35 - 24.85 - 33.10 + 18.85 
= -36.75 , 

with estimated variance equal to s^/« = 3483/4 = 870.75. The contrast SE is then 
equal to the square root of its variance at 29.51. The ratio of the contrast to its SE is 
-36.75/29.51 = -1.245. Compared to a t-distribution on 12 df, this gives P = 0.237 for a 
two-sided test. The contrast sum of squares is then 4 x (-36.75)^ = 5402.25, and results in 
the same conclusion as the ANOVA shown in Table 8.17 (Fi(72® = 1.551, P = 0.237) . Hence, 
there is no evidence of any difference in yield between fungicides with different modes 
of action. The remainder mean square (Fs*)™ = 12.251, P < 0.001) indicates the presence 
of treatment variation not accounted for by this contrast. 

Typically there are two or more comparisons of interest, generating a number of differ- 
ent contrasts. We label the fth contrast as \|/„ with contrast coefficients . . . /,(. In this situa- 
tion, the concept of orthogonality becomes important, because it affects the interpretability 
of the contrasts. We construct the product of two contrasts, here denoted \|/, x by taking 
the pair of coefficients relating to each treatment effect, multiplying these together, and 
summing over all treatment effects, so 



V|/; X \|/;t = . 

;=i 



Two contrasts are said to be orthogonal contrasts if their product is zero. Orthogonal con- 
trasts are also statistically independent with zero covariance. 

In general, the sum of squares associated with a set of t treatment groups with f - 1 df 
can be partitioned into f - 1 orthogonal contrasts each with 1 df. The use of orthogonal con- 



TABLE 8.17 

Multi- Stratum ANOVA Table for Pofato Yields with Treatment Effects (Factor Fungicide) 
Partitioned into a Contrast to Compare Fungicides of Modes A and B Plus a Remainder 
(Example 8.4D) 



Source of Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Block stratum 












Residual 


3 


14,987.20 


4995.73 


1.434 


0.283 


Block.Plot stratum 












Fungicide 


4 


133,419.20 


33,354.80 


9.576 


0.001 


Contrast: mode A vs mode B 


1 


5402.25 


5402.25 


FA'''* = 1.551 


0.237 


Remainder 


3 


128,016.95 


42,672.32 


pRem= 12.251 


< 0.001 


Residual 


12 


41,796.80 


3483.07 






Total 


19 


190,203.20 
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trasts has the advantage that the ANOVA table is invariant to the order in which contrasts 
are added into the model, and different information is contributing to each contrast. 

EXAMPLE 8.4E: POTATO YIELDS* 

The analysis in Example 8.4D ignored our previous partitioning of the control as 
separate from the fungicide treatments. We can reintroduce this partition via a 
contrast that compares the control with the mean of the fungicide treatments, as 
\|/i = Xj - ‘h(x 2 + X3 + X4 + X5). We denote our previous contrast for comparison of modes 
as \|/ 2 . The coefficients for each of the contrasts \i/i and \|/2 are shown in Table 8.18. Their 
product is calculated as 

yj X \|/2 = (1 X 0) + (-0.25 X 0.5) + (-0.25 x -0.5) + (-0.25 x -0.5) + (-0.25 x 0.5) = 0 

and so these two contrasts are orthogonal. Furthermore, we can make comparisons 
between fungicides within each mode of action using contrast % = X 2 - Xj to compare 
El with F4 and contrast \|/4 = X 3 - X 4 , to compare F2 with F3. The coefficients for these 
contrasts are also in Table 8.18, and it is straightforward to verify that any pair of these 
four contrasts is orthogonal. 

Table 8.19 is the ANOVA table with the TrtSS partitioned into single df terms for the 
contrasts fitted in order \|/j, \|/ 2 , \|/ 3 , \|/ 4 , and it is straightforward to verify that the contrast 
sums of squares do not change if these contrasts are fitted in a different order. Each 
contrast is independently summarizing a different aspect of the treatment information. 



TABLE 8.18 



Coefficients for Four Orthogonal Treatment Contrasts for the Potato Yield 
Trial (Example 8.4E) 







Control 


FI 


F2 


F3 


F4 






h 








^5 


Contrast yi 




1 


-0.25 


-0.25 


-0.25 


-0.25 


V 2 




0 


0.5 


-0.5 


-0.5 


0.5 


Va 




0 


1 


0 


0 


-1 


V4 




0 


0 


1 


-1 


0 


TABLE 8.19 














Multi-Stratum ANOVA Table for Potato Yields with Treatment Effects (Factor 




Fungicide) Partitioned into Four Orthogonal Contrasts: Vi ■ ■ ■ 


\|/4 (Example 8.4E) 




Source of 




Sum of 


Mean 




Variance 




Variation 


df 


Squares 


Square 




Ratio 


P 


Block stratum 














Residual 


3 


14,987.20 


4995.73 




1.434 


0.283 


Block.Plot stratum 














Fungicide 


4 


133,419.20 


33,354.80 




9.576 


0.001 


Contrast \|/i 


1 


125,294.45 


125,294.45 




F'fi = 35.972 < 0.001 


Contrast \|/2 


1 


5402.25 


5402.25 




F>i^ = 1.551 


0.237 


Contrast 


1 


2178.00 


2178.00 




F’1'3 = 0.625 


0.444 


Contrast \|/4 


1 


544.50 


544.50 




Fi"> = 0.156 


0.699 


Residual 


12 


41,796.80 


3483.07 








Total 


19 


190,203.20 
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The only contrast giving evidence against its null hypothesis is contrast \|/i, which com- 
pares the negative control with the fungicide treatments. In fact, contrast \|/j here is 
equivalent to the use of factor Type in Example 8.4B, giving the same sum of squares 
and variance ratio (Table 8.16). Contrast \|/j is estimated as -177.0 (SE 29.51), indicating 
that the yield of fhe control treatment is 177.0 units less than the average yield of fhe fun- 
gicide treatments with 95% Cl calculated as -177.0 ± (29.51 x 2.179) = (-241.3, -112.7). The 
predictive model for this experiment can therefore be reduced to this contrast, agreeing 
with the conclusion of Example 8.4C. 

Factors with two levels can always be represented by an interpretable single contrast 
constructed as the difference befween fhe fwo levels, i.e. wifh confrasf coefficienfs f = -1 
and I 2 = 1. Facfors wifh t levels can be represenfed by f - 1 confrasfs, buf if is nof always 
possible fo consfrucf orfhogonal confrasfs fhaf ask sensible quesfions abouf fhe freafmenfs. 
Somefimes if is reasonable fo use confrasfs fo pick ouf a few comparisons of inferesf buf 
nof decompose fhe remainder. Wifhin a factorial sfrucfure if may be sensible fo parfifion 
one or more of fhe facfors info one or more confrasfs plus a remainder. This sfrucfure is 
fhen propagafed info inferacfions involving fhose facfors. Bofh of fhese approaches are 
illusfrafed in Example 8.5. 



EXAMPLE 8.5: HERBICIDE EEEICACY 

A factorial experiment was done to compare the general efficacy of three herbicides (factor 
Herbicide) against nine populations of black-grass (factor Popuiation). Two of the herbi- 
cides (labelled A and C here) are from the same group (type 1 in factor Type), the third 
(labelled B) is from a different group (type 2). The design was arranged as a RCBD with five 
blocks (factor Rep), each containing 27 pots (dummy factor DPot). Six plants were grown 
in each pot and their combined fresh weight (g, variate Fwf) was recorded at the end of the 
study. The data are listed in Table 8.20 and held in file herbicide.dat. Preliminary analysis 
indicated the need for a transformation of the fresh weight and the square root transforma- 
tion, calculated as sqrtFwt = sqrt(Fwf), gave reasonable residual plots. There is interest in 
whether there is any systematic difference in herbicide effect both between and within the 
herbicide groups, and in whether this changes across the populations. 

Here, a crossed treatment structure is appropriate, as the main effects of both herbi- 
cide and population are of interest. The full model can be written in symbolic form as 

Response variable: sqrtFwt 

Explanatory component: [1] -t Herbicide*Population 

Structural component: Rep/DPot 

The mathematical model can be written as 

Vrsk = B + T^Pk + Fterbicide^ + Population^ -t {Herbicide.Population)^^ -t , 

where y,sj. is the response in the fcth block (fc = 1 . . . 5) for the rth herbicide (r = 1, 2, 3 for 
A, B, C) and the sth population (s = 1 . . . 9) with associated deviation g is the overall 
mean, Repj. is the effect of the fcth block. Herbicide^ is the main effect of the rth herbi- 
cide, Population^ is the main effect of the sth population and {Herbicide. Population ) is the 
interaction between the rth herbicide and the sth population. Within the herbicide main 
effect, the two types of herbicide can be compared using a contrast of the form 

\|/j = i (Herbicide^ + Herbicide^) - Herbicide 2 

= 0.5 X Herbicide^ - Herbicide 2 + 0.5 x Herbicide ^ . 
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TABLE 8.20 



Fresh Weight (g) from a Pot Experiment Testing the Efficacy of Three Herbicides on Nine 
Populations of Black-Grass Using a RCBD with Pive Blocks (Example 8.5 and Eile herbicide.dat) 



Population 


Herbicide 




Fresh Weight (g) 




Block 1 


Block 2 


Block 3 


Block 4 


Block 5 


PI 


A 


5.94 


3.63 


5.56 


4.09 


3.65 


P2 


A 


3.88 


2.17 


0.63 


2.82 


1.73 


P3 


A 


3.55 


5.16 


5.17 


1.07 


2.61 


P4 


A 


6.45 


5.56 


1.99 


5.21 


2.51 


P5 


A 


0.10 


0.31 


3.69 


4.56 


0.16 


P6 


A 


4.94 


5.21 


2.51 


3.76 


1.90 


P7 


A 


4.07 


3.74 


4.67 


3.41 


5.73 


P8 


A 


2.13 


6.46 


5.02 


2.36 


2.88 


P9 


A 


2.24 


2.85 


0.63 


1.39 


3.14 


PI 


B 


1.25 


1.01 


0.92 


0.98 


0.26 


P2 


B 


1.55 


1.44 


0.90 


1.12 


1.74 


P3 


B 


1.53 


4.21 


3.39 


4.13 


1.85 


P4 


B 


2.56 


1.49 


1.09 


2.37 


0.66 


P5 


B 


4.96 


5.11 


4.84 


4.64 


4.96 


P6 


B 


1.89 


3.00 


3.05 


1.22 


1.58 


P7 


B 


0.67 


0.39 


0.25 


0.44 


0.40 


P8 


B 


0.47 


0.51 


0.37 


0.27 


0.40 


P9 


B 


0.53 


0.66 


1.70 


0.70 


0.17 


PI 


C 


5.37 


3.96 


4.05 


3.37 


4.11 


P2 


C 


3.60 


1.81 


1.82 


5.21 


2.13 


P3 


C 


4.77 


3.46 


5.58 


4.45 


2.80 


P4 


C 


4.10 


4.48 


7.92 


5.46 


3.97 


P5 


C 


2.96 


5.14 


3.16 


4.46 


2.45 


P6 


C 


2.59 


5.33 


5.38 


5.13 


2.61 


P7 


C 


5.17 


5.66 


4.84 


4.47 


3.44 


P8 


C 


2.63 


5.93 


4.74 


4.71 


4.63 


P9 


C 


4.48 


2.90 


2.91 


5.73 


3.71 



Source: Data from R. Hull, Rothamsted Research. 



Similarly, herbicides A and C are compared via an orthogonal contrast of the form 

\|/2 = Herbicide^ - Herbicide^ . 

Within the interaction, the contrasts are applied to each population as 

\|/is = 0.5 X (Herbicide.Population\g - (Herbicide. Population) 2 s 
-H 0.5 X (Herbicide. Population)^^ 

V 2 s = (Herbicide. Population\^ - (Herbicide. Population)^^ 

for s = 1 ... 9, giving the estimates shown in Table 8.21. At this level, the interest is in 
whether the value of the contrast varies between populations. For example, consistency 
across populations in differences between types of herbicide corresponds to the null 
hypothesis HqI \|/is = 0 for s = 1 ... 9. 
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TABLE 8.21 



Estimated Effects and Contrasts for Main Effect (Herbicide and Population) and interaction 
(Herbicide. Population) Terms (Example 8.5) 



Population 


Estimated Effects 
Herbicide (Type) 


Main 

Effect 


Estimated Contrasts 


A(l) 


B(2) 


C(l) 


Type 1 vs 2 


A vs C 


PI 


0.340 


-0.320 


-0.020 


0.039 


0.480 


0.360 


P2 


-0.069 


0.188 


-0.119 


-0.231 


-0.283 


0.050 


P3 


-0.130 


0.308 


-0.178 


0.199 


-0.463 


0.048 


P4 


0.097 


-0.145 


0.048 


0.191 


0.218 


0.048 


P5 


-0.750 


0.948 


-0.198 


0.069 


-1.422 


-0.552 


P6 


0.006 


0.117 


-0.123 


0.129 


-0.176 


0.128 


P7 


0.350 


-0.523 


0.173 


-0.026 


0.785 


0.176 


P8 


0.256 


-0.457 


0.202 


-0.108 


0.686 


0.054 


P9 


-0.099 


-0.116 


0.214 


-0.263 


0.174 


-0.313 


Main effect 


0.093 


-0.457 


0.363 


1.654 


0.685 


-0.270 



The ANOVA table is Table 8.22; it partitions the Herbicide main effect and 
Herbicide. Population interaction sums of squares into components associated with 
the two contrasts. 

Variance ratios for all buf one of the treatment mean squares are statistically signifi- 
cant. The Herbicide sum of squares is partitioned into the two contrasts, which are 
both highly significant (E/Jm = H4.714,Ei^io4 = 13.392, bothP < 0.001) The type 1 versus 
2 contrast is estimated as 0.685 (SE 0.0640), indicating that herbicides of type 1 (A and C) 
yielded on average 0.685 units more on the square root scale than those of type 2 (B). The 
herbicide A versus C contrast is estimated as -0.270 (SE 0.0739), indicating that herbicide 
A yielded 0.27 units (on the square root scale) less than C on average. These patterns can 
be seen in the full table of predicfed means plotfed in Eigure 8.7. 



TABLE 8.22 

Multi-Stratum ANOVA Table for Black-Grass Eresh Weights (Square Root Scale) from the Herbicide 
Efficacy Experiment (Example 8.5) 



Source of Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Rep stratum 












Residual 


4 


1.3111 


0.3278 


2.671 


0.036 


Rep.DPot stratum 












Herbicide 


2 


15.7207 


7.8604 


64.053 


< 0.001 


Type 1 vs 2 


1 


14.0774 


14.0774 


Fiv2 = 114,714 


< 0.001 


Herbicide A vs C 


1 


1.6434 


1.6434 


pA'C = 13,392 


< 0.001 


Population 


8 


3.5080 


0.4385 


pP = 3.573 


0.001 


Herbicide. Population 


16 


13.9437 


0.8715 


7.102 


< 0.001 


(Type 1 vs 2). Population 


8 


12.4676 


1.5585 


F1V2.P = 12,700 


< 0.001 


(Herbicide A vs C). Population 


8 


1.4761 


0.1845 


pA'-cP = 1.504 


0.165 


Residual 


104 


12.7626 


0.1227 






Total 


134 


47.2461 









Note: Treatment (factor Herbicide) sum of squares partitioned into comparisons between herbicides of types 1 
(A and C) and 2 (B), and between herbicides A and C. 
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FIGURE 8.7 

Predicted fresh weight (g, square root scale) with SED for nine black-grass populations (PI . . . P9) treated with 
herbicides A (•), B (o) and C (•) (Example 8.5). 



The main effect of populafion is highly significant (I^ao 4 = 3.573, P = 0.001), reflect- 
ing overall differences in fresh weight obtained from the different populations (aver- 
aged over herbicides). The Herbicide. Population interaction term is partitioned into the 
interactions of the two contrasts with the Population factor. The interaction of the type 1 
versus 2 contrast is highly significant (Fg^lo'r = 12.700, P < 0.001), indicating that the dif- 
ference between the two types changes across populations, and indeed Figure 8.7 illus- 
trates that this difference is strongly positive for populations P7 and P8, but negative for 
P5. The interaction of the A versus C contrast is not significant (Fs%‘i'^ = 1.504,P = 0.165) 
indicating that the difference between these two herbicides is reasonably consistent 
across the populations. Again, this pattern can be observed in Figure 8.7. We can con- 
clude that the relative effectiveness of the different herbicide types depends on the pop- 
ulation considered but that within herbicides of type 1, herbicide A is generally more 
effective (lower fresh weight) than herbicide C. 

We have seen in Example 8.5 that contrasts can be used to partition treatment infor- 
mation within a two-way crossed structure. This principle can be extended to nested or 
higher-level crossed structures and contrasts may be used to simplify the model terms 
required for prediction. If all of the significant treatment variation can be captured by 
a small set of contrasts, then a simplified model based on those contrasts can be used 
for prediction. This procedure is thus qualitatively different from the evaluation of treat- 
ment comparisons from the predictive model, which are discussed in Section 8.8.5. The 
approach is most useful when it is possible to write the treatment structure as a set of 
meaningful pairwise comparisons. 



8.7 Modelling Patterns for Quantitative Treatments 

When the groups associated with a treatment factor correspond to some real numeric 
(quantitative) scale, we might think of building a model to describe the trend in the 
response in terms of that numeric scale, for example, in Example 1.1, plant height increased 
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linearly in relation to dose. If we can describe this linear trend, then we can use it to pre- 
dict the response for any infermediafe dose. Some responses are more complex, requiring 
a curve: planf yield fends fo respond linearly fo nifrogen applicafion inifially fhen fail 
off; fungal infecfion rafes on planfs fend fo increase up fo some opfimal femperafure and 
fhen decrease for higher femperafures. In fhis secfion, we examine fhe use of confrasfs for 
fiffing simple polynomial models fo quanfifafive factors, i.e. factors where fhe groups cor- 
respond fo posifions on some underlying numeric scale. 

Here, we consider fhe numeric levels of a quanfifafive factor on each unif as a variate, x. 
A polynomial model consisfs of several ferms, each of which is a power of x mulfiplied 
by a coefficienf. The order of fhe polynomial is equal fo fhe highesf power of x presenf, so 
a firsf-order polynomial describes a linear relafionship. A second-order polynomial, or 
quadratic model, also includes the second power or square of fhe explanatory variate, and 
fakes fhe generic form 



f{x,) = a-i- piX; -(- P2X? . 

This equafion consisfs of fhree ferms, and can be considered as fhree componenfs: a con- 
sfanf term (a), a linear term (PiX,) and a quadrafic ferm (P2xf ). This model can be consid- 
ered as an example of polynomial regression (as presented in Secfion 17.1.2), buf here we 
use polynomial confrasfs fo fif models of fhis form. 

As an example, consider an experimenf sef up as a RCBD wifh four blocks, looking af fhe 
response of hydroponic planf growfh (measured as biomass) fo eighf relafive concenfra- 
fions of nufrienf solufion (0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2). A model for fhese dafa can be 
wriffen in mafhemafical form as 



M- ~t hi "t Tj "t Cij , 

where b, is fhe effecf of fhe ifh block, i=l . . . 4, Xy is fhe effecf of fhe 7'fh concenfrafion, 
7 = 1 ... 8, and all ofher ferms are as defined previously. Again, we use sum-fo-zero con- 
sfrainfs. The second-order pol5momial model is applied fo fhe sef of freafmenf effecfs, Xy, 
7 = 1 ... 8. We form one confrasf for each pol5momial ferm (here consfanf, linear and qua- 
drafic) using fhe appropriate power of x fo give fhe confrasf coefficienfs. The consfanf 
confrasf corresponds fo Z^Xybuf, because of fhe sum-fo-zero consfrainfs, fhis is equal fo zero 
and so is omiffed. The linear confrasf fakes fhe form 

(0.25xxi) + (O.5OXX2) + (O.75XX3) + (I.OOXX4) + (I.25XX5) + (1.50xXf,) + (I.75XX7) + (2.00xxg) , 

wifh fhe concenfrafion values being used as fhe confrasf coefficienfs. The quadrafic con- 
frasf fakes fhe form 

(O.252XX1) + (O.502XX2) + (O.752XX3) -t (I.OO2XX4) -t (I.252XX5) + (l.hO^xXg) + (I.752XX7) + (2.002xxg) 
= (0.0625xxi) + (O.25XX2) + (O.5625XX3) + (I.OOXX4) + (I.5625XX5) + (2.25xX(j) 

+ (3.O625XX7) + (T.OOxxg) , 

wifh fhe square of fhe concenfrafion values now being used as fhe confrasf coefficienfs. 

Untorfunafely, fhis approach resulfs in confrasfs fhaf are non-orfhogonal, and so fhe 
apparenf imporfance of each ferm can depend on fhe order in which if is fiffed, and fhe 
esfimafed confrasf value depends on which ofher confrasfs are fiffed. This non-orfhogo- 
nalify can be seen in Figure 8.8a - all powers of x show an increasing paffern for x > 0 and 
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FIGURE 8.8 

(a) Simple powers of explanatory variate: x ( — ), x^lT. (■ ■ ■), x^/8 (- -), x'*/32 (- ■ -); (b) orthogonal polynomials of 
explanatory variate x of order 1 ( — ), 2 (■ ■ ■), 3 (- -) or 4 (- ■ -). 



SO have strong positive correlations across this range. This problem can be avoided by the 
use of orthogonal polynomials, rather than simple powers. Orthogonal polynomials are 
constructed so that the tjth function is of order q, and is orfhogonal fo all of fhe lower order 
funcfions. This means fhaf fhe confrasf for each componenf picks ouf fhe elemenfs of fhe 
paffern fhaf are unique fo fhaf power. Figure 8.8b shows a sef of orfhogonal polynomi- 
als: correlafions wifhin fhis sef are all zero. The form of fhe orfhogonal polynomials also 
illusfrafes fhe complexify allowed wifhin fhese models: a second-order polynomial can 
accommodafe one fuming or inflexion poinf, a fhird-order model can have fwo fuming 
poinfs and so on. 

Calculafion of confrasf coefficienfs for orfhogonal polynomials is less sfraighfforward 
fhan for simple powers and fhese coefficienfs depend on bofh fhe quanfifafive factor lev- 
els and fheir replicafion. In pracfice, sfafisfical soffware will calculate fhe necessary con- 
frasf coefficienfs. Once fhe confrasf coefficienfs have been calculafed, inference follows as 
described in fhe previous secfion. 

In fheory, if is possible fo fif f - 1 orfhogonal polynomials for a quanfifafive factor wifh t 
levels, i.e. a polynomial of order t-1. However, fhis polynomial model would give exacfly 
fhe same fif as use of fhe facfor ifself, and inferpolafion befween factor levels would be 
uninformafive - fhis is illusfrafed in Secfion 17.1.2 in fhe confexf of polynomial regression. 
The usual aim is fo find a low-order (i.e. parsimonious) polynomial fo describe fhe general 
frend across facfor levels. Variafion due fo fhe quanfifafive facfor fhaf is nof accounfed for 
by fhese lower-order confrasfs is usually allocated as a remainder ferm fhaf amalgam- 
afes variafion associated wifh higher-order polynomial ferms. This remainder can fhen 
be fesfed againsf fhe appropriafe residual ferm fo ensure fhaf fhere is no sfafisfically sig- 
nificanf variafion associafed wifh fhe higher-order ferms. This remainder is somefimes 
also called Tack of fif' and is discussed furfher in Secfion 12.8 in fhe confexf of regression 
models. 

EXAMPLE 8.6: VOLTAGE RESPONSE 

An experiment was conducted to investigate the affinity of a sugar transporter protein 
for a substrate within plant cells. A range of voltages associated with different sugar 
concentrations was tested, and the response was measured in terms of electric current 
(variate Km). Nine different voltages were used, in increasing steps from -160 to 0 mV 
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(factor Voltage). The experiment was set up as a RCBD, with blocks corresponding to 
two different occasions (factor Rep) with one replicate of each voltage measured during 
each occasion (factor called DUnit, as the actual randomization of plants within runs is 
not available). The data are listed in Table 8.23 and held in file voltage.dat. 

A natural logarithm transformation, logKm = logf Km), was used to stabilize the vari- 
ances. The model for the data can be written in symbolic form as 

Response variable: logKm 

Explanatory component: [1] + Voltage 

Structural component: Rep/DUnIt 

The corresponding mathematical model is written as 



logKmij = p + Repi + Voltage j + e,y , 



where logKm^j is the log^-transformed observed current in the /th replicate {i = 1, 2) for 
the/th level of voltage applied (/ = 1 ... 9 for -160, -140 ... 0 mV, respectively). The struc- 
tural component generates the replicate effects, denoted Rep, for i = 1, 2, and the devia- 
tions e,y (equivalent to term Rep. DUnit). The effect of the;th voltage is denoted Voltage^. 
The sum-to-zero constraints take the form 2,Rep; = 0 and 'Lf/oltagej = 0. 

The predicted treatment means for this model (presented with the data in Figure 8.9a) 
show a broadly linear pattern of increase in response as voltage increases with a sug- 
gestion of slight curvature. This pattern can be investigated further by use of linear 
and quadratic polynomial contrasts. Table 8.24 lists the estimated treatment effects and 
coefficients for orthogonal linear and quadratic polynomial contrasts across voltages. 

We evaluate the contrasts by multiplying the coefficients by the estimated voltage 
effects, which give the linear contrast equal to 162.6 and the quadratic contrast equal 
to 1218.8, but these values must be re-scaled (to correspond to standardized contrasts) 
before they can be related to the polynomial model for the response, and this is done 
automatically by statistical software. Table 8.25 is the ANOVA table with the Voltage 
sum of squares partitioned into components corresponding to the linear contrast, the 



TABLE 8.23 

Electric Current (km) Observed in Plant Cells 
as a Response to Different Voltages Applied, 
Using a RCBD with Two Replicates (Example 
8.6 and File voltage.dat) 



Voltage 

(mV) 




km 




Rep 1 




Rep 2 


-160 


0.234 




0.219 


-140 


0.320 




0.227 


-120 


0.326 




0.282 


-100 


0.327 




0.277 


-80 


0.331 




0.343 


-60 


0.489 




0.386 


-40 


0.437 




0.421 


-20 


0.786 




0.476 


0 


0.842 




0.611 



Source: Data from Rothamsted Research (T. Miller). 
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Voltage 



FIGURE 8.9 

(a) Observed electric current (o rep 1, • rep 2; log 5 (Km)) and fitted treatment means (•) for different levels of 
voltage and with (b) fitted linear component of trend (solid line) (Example 8.6). 



TABLE 8.24 

Calculation of Coefficients for Orthogonal Linear and Quadratic 
Polynomial Contrasts for Elecfrical Response fo Voltages (Example 8.6) 



Voltage 




Estimated 
Voltage Effects 


Coefficients for 
Linear Trend 


Coefficients for 
Quadratic Trend 




-160 




- 0.5097 


-80 


3733 




-140 




- 0.3353 


-60 


933 




-120 




- 0.2175 


-40 


-1067 




-100 




- 0.2249 


-20 


-2267 




-80 




- 0.1120 


0 


-2667 




-60 




0.1422 


20 


-2267 




-40 




0.1294 


40 


-1067 




-20 




0.4843 


60 


933 




0 




0.6435 


80 


3733 




TABLE 8.25 












Multi-Stratum ANOVA Table for the Electrical Responses from the Voltage Experiment with 




Treatment (Factor Voltage) Sum of Squares Partitioned into Components for Linear and Quadrafic 


Trend and a Remainder (Example 8.6) 








Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


p 


Rep stratum 












Residual 


1 


0.17622 


0.17622 


12.185 


0.008 


Rep.DUnit stratum 












Voltage 


8 


2.33653 


0.29207 


20.195 


< 0.001 


Linear contrast 


1 


2.20458 


2.20458 


152.435 


< 0.001 


Quadratic contrast 


1 


0.06029 


0.06029 


pQuad = 4 ;^69 


0.075 


Remainder 


6 


0.07166 


0.01194 


pRem = 0.826 


0.581 


Residual 


8 


0.11570 


0.01446 






Total 


17 


2.62845 
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quadratic contrast and a remainder. The variance ratio for the remainder term is not 
significant (F6’^g"' = 0.83, P = 0.581), indicating no need for higher-order terms. The qua- 
dratic component is close to significant = 4.17, P = 0.075), indicating weak evi- 

dence for a quadratic component of trend. However, this is small compared with the 
linear component of trend (Fi';™ = 152.44, P < 0.001), which clearly dominates the pat- 
tern. Given the small size of the quadratic component compared with the linear compo- 
nent of trend, we can ignore the quadratic component and allocate it to the remainder 
(which must then be recalculated). The fitted linear trend model is shown in Figure 8.9b 
and takes the form (see Exercise 15.8) 

p.y = -0.434 -H 0.0068 x Voltagej . 



In general, fitting quantitative trends by polynomial contrasts is a much less direct 
approach than regression (Chapters 12 to 15 and 17). However, it can be difficult to account 
adequately for structure within the regression context (Section 11.6). For quantitative fac- 
tors in designed experiments, it is therefore usually advantageous to start with polynomial 
contrasts to investigate the presence and complexity of trend. If the observations have no 
structure, this process can be followed by regression analysis, allowing for other treatment 
factors present (as in Chapter 15). If there is structure present then linear mixed models 
(Chapter 16) can be used as a framework for regression modelling that includes a struc- 
tural component. 



8.8 Making Treatment Comparisons from Predicted Means 

In this section, we consider issues that arise in making treatment comparisons from tables 
of predicted means. These methods should be used following analysis with an appropri- 
ate explanatory model that reflects the experimental aims, as described in the preceding 
sections. The most common type of comparison is a simple pairwise difference of two 
treatments but more complex functions, such as contrasts, may also be of interest. We first 
consider the case of simple pairwise comparisons and return to contrasts in Section 8.8.5. 

In the simplest case, the aim is to test a null hypothesis of equality between a pair of 
treatment population means, for example, Hq: p, = Py for treatments i and j or, equivalently, 
to form a Cl for the difference p, - p^. Recall that |i^ denotes the predicted mean for the ;th 
treatment group for ; = 1 . . . t. As introduced in Section 4.4, these h 5 q)otheses can be inves- 
tigated with statistics of the form 



t = ~ , 

SE(Pi-P;) ' 

where the numerator is a difference between the predicted treatment means and the 
denominator is their SED. This statistic has a t-distribution with df equal to the ResDF 
from the ANOVA table. The test is evaluated against a two-sided alternative hypothesis, 
Hji p, ^ p,, at a specified significance level a^, typically = 0.05, known as the comparison- 
wise significance level. As indicated in Section 2.3.2, is the Type I error, the probability 
of rejecting the null hypothesis when in fact it is true. The Type I error can therefore also 
be interpreted as the probability of obtaining a single false-positive result, i.e. declaring a 



196 



Statistical Methods in Biology 



difference significanf when in facf if is zero. If is imporfanf fo realize fhaf fhe Type I error 
rafe applies fo each individual hypofhesis fesf done as parf of a sfafisfical analysis, and if 
we perform several fesfs fhen fhe probabilify of a false-posifive resulf increases wifh fhe 
number of fesfs; fhis is somefimes referred fo as fhe problem of multiple testing. If we 
make m independenf fesfs af significance level a^, fhen we can regard fhe number of false- 
posifive resulfs as having a Binomial disfribufion (Secfion 2.2.1) wifh m frials and success 
probabilify a^. If follows fhaf 

Prob(af leasf one false posifive) = = 1 - (1 - aj'" , 

where is known as fhe experiment-wise Type I error. For example, if we do 15 indepen- 
denf fesfs wifh tts = 0.05, fhen ttf = 1 - (0.95)^® = 1 - 0.463 = 0.537, i.e. a 53.7% chance of one 
or more false-posifive resulfs. However, in our confexf of freafmenf comparisons from a 
single experimenf, fhe fesfs are nof independenf because fheir denominafors, fhe SED for 
each comparison, are based on fhe same ResMS. When hypofhesis fesfs are nof indepen- 
denf, fhere is less cerfainfy abouf how fhe Type I error rafe accumulafes, as fhis depends 
on fhe degree of dependence befween fhe fesfs: fhe greafer fhe degree of dependence, fhe 
smaller fhe rafe of increase in experimenf-wise error rafe, wifh 

ttj < ttf < m X ttg . 

The lower limif holds only in fhe case when fhe fesfs are perfecfly correlafed. 

Here, we firsf consider fwo general approaches for dealing wifh mulfiple fesfs: fhe 
Bonferroni correcfion (Secfion 8.8.1) and fhe false discovery rafe (Secfion 8.8.2). We fhen 
go on fo discuss some more specific approaches for some common scenarios: pairwise 
comparison of all means wifhin a fable (offen called mulfiple comparisons, Secfion 8.8.3); 
comparison of a sef of freafmenfs againsf a confrol or sfandard (Secfion 8.8.4); and evalua- 
fion of a pre-planned sef of comparisons or confrasfs (Secfion 8.8.5). 

8.8.1 The Bonferroni Correction 

The Bonferroni correcfion adjusfs fhe Type I error rafe for each comparison, a^, down- 
wards. The adjusfmenf is based on fhe number of comparisons, m, fo be evaluafed, and 
aims fo achieve fhe desired experimenf-wise error, ttf. The Bonferroni inequalify was used 
above fo puf an upper limif on fhe experimenf-wise error rafe for m comparisons each 
made af significance level namely < m x a^. 

The Bonferroni correcfion uses significance level aj = ttf /m for each individual compari- 
son, so fhaf fhe experimenf-wise error rafe becomes bounded above by ttf. For example, if 
we make 15 comparisons wifh a comparison-wise significance level of al = 0.003333, fhen 
fhe experimenf-wise error rafe is <15 x = 0.05. Use of al in place of means fhaf fhe 
crifical value of fhe fesf sfafisfic required fo obfain a significanf resulf for any individual 
comparison increases. For example, if our case of 15 comparisons has 18 ResDF, fhen fhe 
crifical value of fhe f-disfribufion moves from 2.10 fo 3.38, i.e. absolufe freafmenf differ- 
ences need fo be 1.6 fimes larger fo be significanf; however, we shall confrol fhe number of 
false posifives. 

The main disadvanfage of fhis approach is fhaf, where many comparisons are made, 
absolufe freafmenf differences offen have fo become very large fo exceed fhe Bonferroni- 
correcfed crifical value, and so power (fhe probabilify of defecfing a difference if one is 
present Secfion 10.3) is likely fo fall considerably. 
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8.8.2 The False Discovery Rate 

The false discovery rate, FDR, introduced by Benjamin! and Hochberg (1995), is a different 
type of approach that does not attempt to control for the experiment-wise error rate, but 
instead seeks to quantify the expected proportion of Type I errors within the set of rejected 
hypotheses. So an FDR of 0.05 means that 5% of the differences that have been found sta- 
tistically significant are expected to be false-positive results. 

There are two ways to calculate the FDR for a given set of comparisons. One method 
fixes the significance level for individual tests and then calculates the observed FDR. 
The other calculates the required significance level, al, for individual tests required to 
achieve a pre-specified value of FDR. Both methods are applied after the test results have 
been obtained, and we outline them both below. 

We start by calculating the observed FDR for m comparisons made with comparison- 
wise significance level a^. The observed FDR is calculated as the ratio of the expected num- 
ber of significant results under the null hypothesis to the observed number of statistically 
significant results, s, or 



s 

For example, suppose that we make 200 comparisons at = 0.05, of which 24 gave a 
significant result. Then FDR = 200 x 0.05/24 = 0.417, i.e. it is expected that 41.7% of the 
24 significant comparisons, i.e. approximately 10 of them, will correspond to false-positive 
results. 

The procedure to set the comparison-wise significance level to obtain a given level of 
FDR is a little more complicated. First, we rank the observed significance levels from the 
individual comparisons in ascending order as 

^’( 1 ) - P(2) ^ ^ P(m) / 

where subscript (i) indicates the ith most significant test (i.e. the ith smallest observed sig- 
nificance level). We then calculate the values m x for k = l ...m. For control of the false 
discovery rate at level FDR, we find the largest value k such that 

m X < FDR , 

and reject all null hypotheses with If there is no k that satisfies that condition, 

then none of the hypotheses are rejected. 

This procedure can be followed for any set of m independent tests, and for dependent 
tests that meet certain conditions (see Benjamini and Yekutieli, 2001 for more details). 
This includes most situations of pairwise comparisons (Section 8.8.3) and comparisons 
of treatments with control (Section 8.8.4). For other sets of dependent tests the expres- 
sions above are modified by replacing the total number of comparisons, m, by m* which 
is calculated as 



m* 
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The FDR approach seems a good compromise between two extremes: either ignoring the 
problem of multiple testing (as when we use unadjusted LSDs, Section 8. 8.3.1), which may 
lead to many (unrecognized) false posifives; or specificafion of fhe experimenf-wise error, 
which may give a loss of power (as when we use fhe Bonferroni correcfion). 



8.8.3 All Pairwise Comparisons 

In fhis secfion, we consider several differenf mefhods used for making all pairwise com- 
parisons (offen called multiple comparisons) within a table of predicfed means. This is 
mosf commonly used for an unsfrucfured sef of freafmenfs, buf if may also be used fo 
invesfigafe a fable of means from a sfrucfured sef of freafmenfs. There are tx{t- l)/2 pair- 
wise comparisons for a sef of t means, and fhe number of comparisons fhus increases pro- 
porfionally fo fhe square of fhe number of means. For example, for four freafmenf groups 
fhere are six possible pairwise comparisons, buf for 10 freafmenf groups fhere are 45 pos- 
sible pairwise comparisons. The sef of fesfs associafed wifh fhese comparisons are nof 
independenf. Flere, we consider fhe use of fhe LSD, mulfiple range fesfs and Tukey's simul- 
faneous confidence infervals for pairwise freafmenf comparisons. We assume fhaf all of 
fhe freafmenf comparisons are esfimafed wifh equal precision, i.e. fhaf a single common 
SED applies fo fhe fable of predicfed means, wifh associafed residual df denofed ResDF. 

In fhe confexf of mulfiple comparisons, we offen rank fhe t predicfed freafmenf means as 

A(i) ^ A(2) ^ ... ^ ii(f) , 

where fhe subscripf (i) denofes fhe ifh largesf mean, and differences wifhin fhis ordered 
sef are fhen examined. The sfafisfical properfies of this ordered set differ from fhose of 
a random sample and inference requires fhe disfribufion of fhe range of an ordered sef 
under fhe null hypofhesis fhaf fhe populafion effecfs are all equal; fhis is known as fhe 
Sfudenfized range disfribufion. Quanfiles for fhis disfribufion are available in mosf sfafis- 
fical soffware. We denofe the 100(1 - ajth percentile of fhe Sfudenfized range disfribufion 
for t groups wifh ResDF residual df as ^[“Resor- 



8. 8. 3.1 The LSD and Fisher's Protected LSD 

The leasf significanf difference, LSD, was infroduced in Secfion 4.4. For fwo freafmenfs 
labelled as i and;, fhe LSD was defined as fhe smallesf absolufe difference fhaf would resulf 
in rejecfion of fhe null hypothesis Flo: p, = at significance level a^, and was calculafed as 

LSD = tlTi§xSED, 

where fResop is the 100(1 - as/2)th percentile of fhe f-disfribufion wifh ResDE df. 

The unprofecfed LSD approach fo mulfiple comparisons rejecfs fhe null hypofhesis for 
any pair of freafmenfs whose absolufe difference exceeds fhe LSD, i.e. where | p, - P;j ^ LSD. 
This approach provides no confrol of fhe experimenf-wise error rafe, which fherefore 
increases wifh fhe fofal number of pairwise comparisons as described above, alfhough fhe 
Type I error rafe for each individual comparison is mainfained af level a^. 

The profecfed LSD procedure differs only in ifs requiremenf fhaf fhe overall E-fesf for 
fhe null hypofhesis Hq: pj = . . . = P( musf be rejecfed before any individual comparisons 
are evaluafed. Flowever, fhis procedure gives no additional confrol of experimenf-wise 
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error. It is possible (although uncommon) to obtain a significant F-statistic without any 
of fhe pairwise freafmenf differences exceeding fhe LSD. Conversely, if is also possible fo 
obfain a non-significanf F-sfafisfic when fhere is one or more significanf pairwise differ- 
ences wifhin fhe sef of freafmenf comparisons. The prof ecf ion afforded by fhe F-fesf may 
fherefore be illusory. 

Because fhese procedures provide no confrol of fhe experimenf-wise error rafe, fhey 
should be used only when fhis confrol is regarded as unimporfanf. A simple way of infro- 
ducing fhis experimenf-wise confrol suggesfed by Hsu (1996, Secfion 4.1.8) is fo use an 
adjusfed version, fhe aLSD, calculafed as 



aLSD = 



X SED . 

V2 



8. 8. 3. 2 Multiple Range Tests 

Mulfiple range fesfs work on fhe ranked sef of predicfed means and are used fo iden- 
fify groups of freafmenfs wifh a similar response. The main difference befween fhe mosf 
common procedures is in fhe significance level used af each sfage. Here, we consider fhe 
Newman-Keuls and Duncan's mulfiple range fesfs, as being among those most commonly 
used in practice. These procedures define a comparison-wise significance level aj (defined 
below) fhen run as follows. 

Sfep 1: Compare fhe f-sfafisfic for fhe largesf and smallesf means, i.e. t(i)(f) = (jijy- P(f))/ 
SED, wifh (/[“ilioF/ fhe 100(1 - as)fh percentile of fhe Sfudenfized range disfribufion for t 
groups wifh ResDE df. If f(i)(() < then we conclude that there are no differences 

within this set of means and sfop. Ofherwise, we conclude fhaf some differences are pres- 
enf, and move onfo sfep 2. 

Sfep 2: Repeaf fhe procedure on fhe fesf sfafisfics t(i)(f_i) = (P(i)- ji(,_y)/SED and 
f( 2 )(t)= (A( 2 )“ A(f))/SED, adjusfing fhe number of groups fo f - 1 for fhe Sfudenfized range 
disfribufion and fesfing whefher t(i)(f_i) < (/[“i^Resop or f( 2 )(() < i?l“i,ResDF- If differences are 
presenf wifhin a sef, fhen we proceed fo fesf subsefs of f - 2 adjacenf means wifhin fhaf sef. 

The procedures confinue in fhis manner, working wifh progressively smaller subsefs 
unfil all differences are less fhan fhe required value, giving groups of means fhaf can be 
considered as nof significanfly differenf. Eor each subsef so idenfified, a common leffer is 
allocafed fo all members. The only excepfion is fhaf a new leffer is nof allocafed fo any sub- 
sef of a group already found fo confain no differences. Any mean nof allocafed fo a group 
af fhe end of fhe procedure is assigned ifs own leffer. This process is illusfrafed in Table 
8.26. In fhis example fhe groups are disfincf, buf in many cases fhey will overlap. 

The Newman-Keuls mefhod uses al = Ug af each sfep, where is fhe comparison-wise 
error rafe; fhis resulfs in an experimenf-wise error rafe greafer fhan a^. Duncan's mulfiple 
range fesf uses aj = 1 - (1 - as)“~\ where u is fhe size of fhe subsef being fesfed af each 
sfage. These values are much larger fhan when u is large, so fhis procedure is more 
lax fhan Newman-Keuls af fhe inifial sfages. This approach is inf ended fo preserve fhe 
comparison-wise error rafe af a^. The experimenf-wise error rafe for Duncan's mulfiple 
range fesf can fherefore be considerably larger fhan a^. 

In bofh cases, fhe acfual experimenf-wise error is difficulf fo defermine (ofher fhan fhaf 
if is greafer fhan aj and may depend on fhe unknown configurafion of fhe frue popula- 
fion means. This ambiguify in fhe experimenf-wise error rafe is a major drawback fo fhese 
mefhods and we fherefore do nof recommend fheir use. 
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TABLE 8.26 



Schematic Representation of a Multiple Range Test with Five Treatments 





Difference 


















Step 


Tested 


[4(1) 


[4(2) 


[4(3) [4(4) [4(5) 


Set 


Result 


Action 


1 


A(l) - [4(5) 




□ 


D 


[ 


1 




[(1).(5) > 


„[“*1 

^5,ResDF 


Significant, 
test subsets. 


2 


(4(1) - [4(4) 




□ 


D 


1 


[ 


0.1 


4^(1), (4) > 


?4,rLdF 


Significant, 
test subsets. 




(4(2) - [4(5) 


CD 


■ 


D 




1 


0.2 


4(2), (5) > 


^4,ResDF 


Not significant, 
stop. 


3 


[4(1) - [4(3) 




□ 


1 


[ 


[ 


0.1.1 


4(1), (3) > 


^3,ResDF 


Significant, 
test subsets. 




[4(2) ~ [4(4) 


□ 


■ 


D 


1 


[ 


0.1.2 


4(2),(4) > 


?1rLdf 


Not significant, 
stop. 


4 


[4(1) - [4(2) 


- 


■ 


D 


D 


D 


O.l.l.l 


4(1),(2) > 


^2,ResDF 


Significant, 
no subsets, 
stop. 




[4(2) - [4(3) 


CD 


■ 


1 


: 


D 


0.1.1.2 


4(2),(3) > 


^2,ResDF 


Not significant, 
stop. 


Final 




CD 


□ 


□ 1 


] 


D 








Assign letters. 






a 


b 


b 


b 


b 











Note: Steps, and tests within steps, are executed in sequential order. Set numbers relate to 
those from the preceding step, for example, 0.1.1 and 0.1.2 arise as subsets from set 
0.1. Rectangles represent the ordered predicted means |l(i) to P( 5 ). Means compared 
in each step are shown as filled bars; means found to differ are coloured in black; 
those that do not are coloured grey. At final step, groups of treatments found not to 
differ are assigned a common letter. 



8. 8. 3. 3 Tukey's Simultaneous Confidence Intervals 

Finally in this section, we describe the use of Tukey's simultaneous confidence intervals, 
where the coverage probability applies to the full set of intervals. For treatments i and;, the 
100(1 - aj% confidence interval for the comparison p, - is 

(p pp + X SED . 

This approach is perhaps more useful fhan simple festing, as if provides a range of plausi- 
ble values for each comparison. Bofh the position and length of these confidence infervals 
may give useful informafion on treafment differences. 

All of the formulae given in fhis section assume that treatment groups have equal rep- 
lication and hence equal precision. Calculations become more complex when groups have 
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unequal replication, and hence comparisons have unequal precision. In particular, this 
may give some inconsistencies in the groups formed at different steps within the multiple 
range tests. 



8.8.4 Comparison of Treatments against a Control 

The comparison of f reaf menfs againsf a conf rol or sfandard f reafmenf is a common require- 
menf in screening frials, where a sef of new freafmenfs is evaluafed againsf sfandard prac- 
fice. This sef of comparisons is a subsef of all pairwise comparisons, buf we can achieve 
more power by recognizing fhe sfrucfure of fhe subsef. Dunneff's mefhod is simple and 
consfrucfs a sef of confidence limifs for fhe comparison of each new freafmenf populafion 
mean (p^, j = 2 ... t) wifh fhe confrol (pi). 

If fhe aim is fo defecf freafmenfs fhaf give a larger value fhan fhe confrol (a one-sided 
fesf) fhen fhe mefhod generafes lower limifs for fhe difference of freafmenfs wifh fhe con- 
frol as 



(Py Pj) df-l,ResDF ^ SED , 

where d|“i,ResDP is the 100(1 - ag)th percentile of Dunneff's distribution for f - 1 treatment 
groups (excluding the control) and ResDF df for the SED. Quantiles of Dunneff's distribu- 
tion are available in most statistical software. Any treatment with a lower limit greater 
than zero can then be considered as larger than the control at significance level a^. If the 
aim is the detection of treatments that give a smaller value than the control (another one- 
sided test) then the method generates upper limits as 

(A,- Al) + rfJn.ResDF X SED . 



Any treatment with an upper limit less than zero can be considered as smaller than the 
control. Eor a two-sided test, you should calculate both limits after adjusting the critical 
value, using 



(Ai Ai) i ^{-i.rIsdf X SED , 



and consider any treatment with either a lower limit greater than zero or an upper limit 
less than zero as different from the control. 



8.8.5 Evaluation of a Set of Pre-Planned Comparisons 

In some situations, there may be a pre-planned subset of treatment comparisons that are of 
particular interest. To qualify as pre-planned, the comparisons must be determined before 
any results are obtained; this matter is discussed further in Section 8.8.6. In this more gen- 
eral situation, it is difficult to obtain an optimal strategy, and so we deal with the problem 
of multiple testing by using the methods of Sections 8.8.1 and 8.8.2. If controlling Type I 
error is the main concern, then a Bonferroni correction to the significance level would be 
appropriate. If we wish to retain power, but with some insight into the false-positive rate, 
then use of the FDR may be more appropriate. 
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In this context, comparisons that are more complex than simple differences, such as 
freafmenf confrasfs, may be of inferesf. Here, a confrasf is defined as a linear funcfion of 
fhe populafion means, i.e. of fhe form 



/iPl + 12M-2 + ••• + / 

/=1 

which is esfimafed by subsfifufion of fhe predicfed means in place of fhe unknown frue 
populafion means. Esfimafion of fhe SE for fhis confrasf is more difficulf fhan wifh effecfs 
(Secfion 8.6) because correlafions befween fhe predicfed means musf be accounfed for, buf 
fhis can be done wifh sfafisfical soffware. 

Before our final summing up, we consider an example in which a sef of pre-planned 
comparisons are of inferesf, and use all of fhe mefhods discussed in fhis secfion fo demon- 
sfrafe some of fhe differences befween fhem. 

EXAMPLE 8.7: LUPIN VARIETY TRIAL 

A field trial was set up to evaluate the overall performance of a set of lupin breeding 
lines. The experiment was laid out as a RCBD with three blocks of 14 plots (factors 
Block and Plot). Fourteen different lines were tested (factor Line), comprising 12 dwarf 
lines (DTN lines) and two non-dwarf lines (CH-304 lines). Performance across a range 
of characteristics, including the average number of plants per square metre (variate 
NPIanf) and oil yield (t/ha, variate OilYield), was to be compared with the candidate 
variety for release, line DTN20. Here, we analyse oil yields. The data are held in file 
LUPiNTRiAL.DAT and listed in Table 8.27. 



TABLE 8.27 

Average Number of Plants (NPlant) and Oil Yield (t/ha,Yield) from a RCBD with Three Blocks and 
14 Lupin Breeding Lines (Example 8.7 and File lupintrial.dat) 



Plot 




Block 1 






Block 2 






Block 3 




Line 


NPlant 


Yield 


Line 


NPlant 


Yield 


Line 


NPlant 


Yield 


1 


DTN84 


16.68 


0.36 


DTN84 


31.13 


0.34 


DTN31 


26.68 


0.21 


2 


DTN108 


24.46 


0.58 


DTN12 


31.13 


0.36 


DTN78 


28.90 


0.36 


3 


DTN78 


37.80 


0.39 


DTN04 


24.46 


0.33 


DTNIO 


28.90 


0.32 


4 


DTN19B 


37.80 


0.38 


DTNll 


55.58 


0.38 


CH304-70 


26.68 


0.24 


5 


CH304-73 


22.23 


0.37 


DTN19B 


37.80 


0.33 


DTN84 


26.68 


0.58 


6 


DTNIO 


37.80 


0.30 


DTNIO 


28.90 


0.34 


DTN108 


6.67 


0.56 


7 


DTNll 


24.46 


0.29 


DTN108 


26.68 


0.54 


DTN20 


24.46 


0.41 


8 


DTN19A 


26.68 


0.17 


DTN20 


24.46 


0.35 


DTN19B 


40.02 


0.32 


9 


DTN04 


35.57 


0.26 


DTN31 


31.13 


0.18 


DTN19A 


8.89 


0.30 


10 


DTN31 


24.46 


0.19 


DTNOl 


37.80 


0.37 


DTNOl 


31.13 


0.38 


11 


CH304-70 


26.68 


0.22 


CH304-73 


35.57 


0.24 


DTN04 


28.90 


0.23 


12 


DTN20 


31.13 


0.32 


DTN19A 


24.46 


0.23 


DTNll 


46.69 


0.28 


13 


DTN12 


20.01 


0.31 


DTN78 


35.57 


0.59 


DTN12 


31.13 


0.35 


14 


DTNOl 


42.24 


0.32 


CH304-70 


17.79 


0.24 


CH304-73 


26.68 


0.29 



Source: Data from I. Shield, Rothamsted Research. 
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The full model can be written in symbolic form as 

Response variable: OilYield 

Explanatory component: [1] + Line 

Structural component: Block/Plot 

The mathematical model for these observations is written as 



OilYieldij = p + Blocki + LinCj + eij , 



where OilYield^ is response in the ith block (1 = 1 ... 3) for the ;th line (/ = 1 ... 14). The 
structural component generates the block effects, denoted Blacky for i- \ ... 3, and the 
deviations (equivalent to term Block. Plot). The effect of the jth line is denoted LinCj. 
The sum-to-zero constraints take the form Z, Blockj = 0 and Xy L/we^ = 0. 

The oil yields did not require transformation, and the predicted treatment means are 
calculated as 



|lj = |i + Linej . 

These predicted means are listed in Table 8.28. The ResMS obtained from ANOVA 
was = 0.0039 on 26 df, leading to SEDs for treatment comparisons calculated as 
V(2 X 0.0039/3) = 0.0509. 

The aim of the analysis is comparison of other lines with line DTN20 (with predicted 
mean 0.360), which we can consider as comparisons with a control, as described in 
Section 8.8.4, and so we start by using Dunnett's method. Eor a two-sided test, Dunnett's 
method for significance level = 0.05 requires fhe 97.5% critical value of Dunneft's dis- 
tribution for 13 treatments and 26 df, i.e. = 3.004. By rearranging the formula in 
Secfion 8.8.4, we find that treatments different to DTN20 must satisfy one of the follow- 
ing conditions: 



p, > Pi + df^fp X SED = 0.360 + 3.004 x 0.0509 = 0.513 
p, < Pi - X SED = 0.360 - 3.004 x 0.0509 = 0.207 



Any line with predicted oil yield > 0.513 can be considered to yield more than DTN20, 
and any line with predicted oil yield < 0.207 can be considered to have a lower yield 



TABLE 8.28 

Predicted Means for 14 Breeding Lines (SED = 0.0509 on 
26 df) in the Lupin Variety Trial (Example 8.7) 



Line 


Predicted 

Mean 


Line 


Predicted 

Mean 


CH304-70 


0.233 


DTN12 


0.340 


CH304-73 


0.300 


DTN19A 


0.233 


DTNOl 


0.357 


DTN19B 


0.343 


DTN04 


0.273 


DTN20 


0.360 


DTNIO 


0.320 


DTN31 


0.193 


DTN108 


0.560 


DTN78 


0.447 


DTNll 


0.317 


DTN84 


0.427 
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FIGURE 8.10 

Range of predicted oil yields of a set of lupin breeding lines considered not different from line DTN20 under 
different tests (Example 8.7). Lines not labelled: CH304-70 (= DTN19A), DTNIO (> DTNll), DTN19B (> DTN12) 
and DTNOl (< DTN20). 



than DTN20. This range is shown in Figure 8.10 (labelled Dunnett), and only lines 
DTN31 and DTN108 are outside of this range. 

Instead of using Dunnett's method, we might have considered this as an arbitrary 
set of 13 pre-planned comparisons, and used a Bonferroni correction to the critical 
value. Instead of using = 0.05 for each individual comparison, we would then use 
aj = 0.05/13 = 0.00385. For a two-sided test, we use al/2 then = 3.174 and lines 

with predicted means outside the range (0.199, 0.521) can be considered different from 
DTN20. This range is very close to that obtained from Dunnett's method, and is also 
shown in Figure 8.10 (labelled Bonferroni), leading to the same conclusions. 

Instead of using the structure of the method, we might think of treating this as a 
problem of multiple comparisons, and extract conclusions for the tests we are inter- 
ested in. The range of values considered not different from DTN20 using Tukey's simul- 
taneous confidence intervals are shown in Figure 8.10 (labelled Tukey Cl). The range 
for this method is greater, because it allows for 14 x 13/2 = 91 tests to have taken place, 
whereas we are interested in only 13 of them (each line vs DTN20). This test identifies 
only DTN108 as having a yield different to DTN20. If instead we use the LSD (Figure 
8.10, labelled LSD) then there is no allowance for multiple testing, hence no adjustment 
to the significance level and lines DTN31, DTN19A, CH304-70 and DTN108 are identi- 
fied as different to DTN20. However, if we use the adjusted LSD by substituting the 
Studentized range distribution in place of the t-distribution then the results are similar 
to Tukey's simultaneous confidence intervals (Figure 8.10, labelled aLSD). In this con- 
text, none of these procedures takes account of the number of tests of interest, and may 
either over- or under-estimate the number of differences. This mismatch illustrates the 
benefit of considering the structure of the problem rather than automatic use of multiple 
comparison procedures. 

As another possibility, we might use the unadjusted LSD in combination with the 
FDR to give insight into the expected number of false-positive results. Testing at 
ttj = 0.05, we identify four significant results out of 13, so the expected false-discovery 
rate is 100 x 13 x 0.05/4 = 16.25%. If we wish to restrict our false-positive rate to 5%, then 
the first step is to rank the 13 tests in order of the observed significance levels associ- 
ated with the t-tests, as shown in Table 8.29. Calculating 13 x P(j.)//c for each rank, i.e. for 
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TABLE 8.29 

Calculation of Significance Level Required fo Obtain FDR of 5% (Example 8.7) 



Rank 

(k) 


Line 


Predicted 

Mean 


Difference 
from DTN20 


t 


P 


13 X P/k 


1 


DTN108 


0.560 


0.200 


3.933 


0.001 


0.007 


2 


DTN31 


0.193 


-0.167 


-3.277 


0.003 


0.019 


3 


CH304-70 


0.233 


-0.127 


-2.491 


0.019 


0.084 


4 


DTN19A 


0.233 


-0.127 


-2.491 


0.019 


0.063 


5 


DTN04 


0.273 


-0.087 


-1.704 


0.100 


0.261 


6 


DTN78 


0.447 


0.087 


1.704 


0.100 


0.217 


7 


DTN84 


0.427 


0.067 


1.311 


0.201 


0.374 


8 


CH304-73 


0.300 


-0.060 


-1.180 


0.249 


0.404 


9 


DTNll 


0.317 


-0.043 


-0.852 


0.402 


0.581 


10 


DTNIO 


0.320 


-0.040 


-0.787 


0.439 


0.570 


11 


DTN12 


0.340 


-0.020 


-0.393 


0.697 


0.824 


12 


DTN19B 


0.343 


-0.017 


-0.328 


0.746 


0.808 


13 


DTNOl 


0.357 


-0.003 


-0.066 


0.948 


0.948 


- 


DTN20 


0.360 


0 


- 


- 


- 



k=l ... 13, gives the last column in Table 8.29. Only the first two values are < 0.05, so to 
obtain an FDR of 5% we rejecf only the two null hypotheses corresponding to the two 
smallest observed significance levels, which, in this case, matches the conclusions from 
Dunneff's tesf. 



8.8.6 Summary of Issues 

We have presented several approaches to multiple testing and considered specific meth- 
ods for multiple comparisons and comparisons against a control. The statistical literature 
contains many more methods and it can be difficult to decide which procedure is the most 
appropriate. Miller (1981) and Hsu (1996) provide a more detailed account of the subject. To 
finish, we discuss some controversial issues. 

There is a school of thought that states that multiple comparisons should never be per- 
formed for experiments where there is some structure within the set of treatments (see 
e.g. Bondari, 1999; Cousens, 1988; Gates, 1991; Gilligan, 1986; Madden, 1982; Pearce, 1993; 
Perry, 1986; Webster, 2007). The main concern of these authors is that multiple procedures 
ignore this structure, and we agree that this is a grave error. However, once the treatment 
structure has been used to obtain a set of predictive terms, it can be helpful to evaluate 
comparisons within these terms, and the issue of multiple testing should then be consid- 
ered. For example, in a crossed three-way treatment structure with a significant three- 
way interaction, it may be useful to use multiple comparisons to help disentangle patterns 
within the three-way table. If only main effects were significant, then comparisons within 
predictive tables for those main effects should be evaluated, not comparisons on the whole 
three-way table. 

Somewhat different considerations arise when the response is examined after the exper- 
iment is done to decide which comparisons to test, i.e. a posteriori. In any set of treatments, 
even if there are no differences between the true population means, some groups will 
have smaller responses and some larger responses, purely because of random sampling 
variation. The eye is drawn to comparisons comprising the larger differences, and this 
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bias must be taken into account. This is done by adjustment of the experiment-wise error 
to control for all possible fesfs of pairs of freafmenfs wifhin fhe sef, and fhis adjusfmenf is 
necessary even if all of fhe fesfs are nof made. 

EXERCISES 

8.1 Re-analyse fhe dafa of Exercise 7.4 faking info accounf fhe crossed freafmenf sfruc- 
fure (addifional factors Cultivar and Rate can be found in file inoculation.dat). 
Write down fhe model for fhis analysis in bofh mafhemafical and symbolic form. 

Is fhere any evidence for differences among culfivars or inoculafion rates, and do 
fhese factors acf independenfly? Compare fhis analysis wifh fhaf from Exercise 7.4: 
has fhe crossed sfrucfure clarified your resulfs? 

8.2 An experimenf designed as a RCBD wifh fhree blocks and a 2 x 2 x 2 factorial 
sfrucfure invesfigafed fhe effecf of fhree factors and fheir inferacfions on fhe rafe 
of callus growfh on wheaf seeds. Wheaf seeds were placed in separate isolafion 
confainers wifh sefs of eighf confainers, one for each of fhe eighf freafmenfs, kepf 
fogefher in holding frays (factor Tray). The freafmenf factors were age of fhe seed 
('old' or 'young', facfor Age), concenfrafion of growfh media (2.5 or 5 mg, facfor 
Cone) and fype of growfh promoter (Cuflass or Rapier, facfor Type). Seeds were 
weighed (variafe Weight) affer fhey had been in fhe media for 15 days. Analyse 
fhe seed weighfs; fhe dafa sef is in file callus.dat. Remember fo check fhe model 
assumptions. What conclusions can you draw from fhis experimenf?* * 

8.3 A field experimenf invesfigafed fhe effecf of fwo seed rafes (40 or 80 seeds/ 
m^, facfor Rate) and two row spacings (12 or 36 cm, factor Spacing) on the 
performance of four lupin genotypes: two determinate genotypes, A and 
B, and two new dwarf-determinate genotypes, C and D (factor Genotype). 
The experiment was designed as a three-block RCBD with 16 plots per block 
and a 4 X 2 X 2 factorial treatment structure. At harvest the number of lupin 
plants per m^ was recorded (variate NoPlants). The data set can be found in 
file LUPiNDENSiTY.DAT. Analyse the densities at harvest and identify and inter- 
pret a suitable predictive model. Do the determinate and dwarf-determinate 
genotypes behave differently?^ 

8.4 A field experiment to investigate the effect of weed competitors on yield of win- 
ter wheat was set up as a RCBD with three blocks of 18 plots. Three weed species 
were used: chickweed (CW), black-grass (BG) and cleavers (CL). Target weed 
densities were 0, 40, 80, 160, 320 and 640 plants per m^ for CW and BG, and 0, 

3, 6, 12, 24 and 48 plants per m^ for GL. However, the weed densities achieved 
were lower and differed among species. The unit numbers (ID), structural fac- 
tors (Block, Plot), species sown (factor Weed), density achieved (variate Density) 
and the final yields at harvest (variate Yield, tonnes/hectare at 85% dry matter) 
are given in file weedcompetition.dat. Gonsider whether it is appropriate to 
consider density as crossed with or nested within weed species, and construct a 
suitable factor for the density treatment. Analyse the data and interpret the tests 
generated from your ANOVA table. What conclusions can you draw from this 
trial?! 



Data from M. Wilkinson, Rothamsted Research. 

* Data from I. Shield, Rothamsted Research, 
t Data from R Lutman, Rothamsted Research. 
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8.5 Re-analyse the data of Exercise 6.3 using new factors (and information supplied 
in file scab.dat) fo answer fhe following quesfions: Does fhe addifion of sulphur 
affecf fhe level of scab? Does eifher of fhe rafe or fiming of applicafion affecf fhe 
level of scab? Do fhese fwo factors acf independenfly? (We re-visif fhese dafa in 
Exercise 11.4.) 

8.6 An experimenf assessed fhe effecf of fwo lecfins, Con-A and GNA, on nemafode 
mofilify. Nemafodes were incubated overnighf wifh one of fhe fwo lecfins or 
a buffer solufion (PBS) as a confrol (factor Treatment). Nematodes were placed 
in the centre of Petri dishes, with four dishes allocated to each treatment com- 
pletely at random. Here, we analyse the total distance moved by the nematodes 
in each dish after 40 min. Pile nematodes.dat contains the unit numbers (DDish), 
explanatory factor (Treatment) and distances moved (variate Distance). Analyse 
these data and construct contrasts to assess whether (a) addition of lectins affects 
nematode movement and (b) the two lectins have similar effects on movement.* * 

8.7 An experiment at Rothamsted Research in 1996 investigated the yield response 
of forage maize to nitrogen fertilizer. The experiment was designed as a RCBD 
with three blocks of four plots, with nitrogen fertilizer rates of 0, 70, 140 and 
210 kg N. The whole crop forage yields from each plot (at 100% dry matter in 
tonnes/hectare) are shown in Table 15.11. Pile forage.dat contains unit numbers 
{ID), structural factors (Block, Plot), explanatory factor N and the final yields 
(variate Yield). Analyse these data using ANOVA and incorporate a first-order 
polynomial (linear trend) in nitrogen fertilizer rate. State your conclusions from 
this analysis.^ 

8.8* Consider the data from the calcium pot trial of Example 4.1 (Table 4.1 and file 
calcium.dat). In this trial, the treatments A, B, C and D were concentrations of 
calcium in the soil, measured as relative concentrations of 1, 5, 10 and 20, respec- 
tively. Re-analyse these data using polynomial contrasts. Which low-order poly- 
nomial provides the best fit to these data? 

8.9 In Example 8.7, a three-block RCBD lupin breeding line experiment was 
described and the resulting oil yields analysed. Now analyse the average num- 
ber of plants per square metre (variate NPIant) in a similar way. The data can be 
found in Table 8.27 and file lupintrial.dat. Compare the plant density of other 
lines with the line DTN20.1 



Data from R. Curtis, Rothamsted Research/Bionemax. 

* Data from R Poulton, Rothamsted Research, 
t Data from I. Shield, Rothamsted Research. 
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In this chapter we present the analysis of some common designs, introduced in Chapter 3, 
that have blocking structures somewhat more complex than that of fhe randomized com- 
plefe block design (RCBD). Recall fhaf fhe sfrucfure of an experimenf describes sources of 
heferogeneify among fhe experimenfal unifs (Secfion 3.1.3). Blocking is included wifhin fhe 
sfrucfural componenf of fhe model, which also encompasses ofher aspecfs such as fhe pres- 
ence of fechnical replicafes. The blocking sfrucfure may be nesfed, crossed or confain bofh 
nesfed and crossed componenfs (Secfion 3.2). To recap, nesfed sfrucfures comprise mulfiple 
unifs af lower levels of fhe experimenfal sfrucfure associafed wifh a single unif af a higher 
level, wifh no relafionship befween lower level unifs confained in differenf higher level 
unifs. For example, in an experimenf laid ouf as a RCBD (Chapfer 7), plofs are regarded as 
lower level unifs, and blocks as higher level unifs; if is assumed fhaf fhere is no associafion 
befween plofs in differenf blocks and so plofs are considered fo be nesfed wifhin blocks. On 
fhe ofher hand, crossed blocking sfrucfures occur when lower level unifs are simulfane- 
ously included wifhin fwo independenf higher level unifs associafed wifh differenf factors. 
For example, consider a recfangular layouf of pofs in a glasshouse experimenf, wifh bofh 
rows and columns of fhe layouf regarded as blocking factors. Each pof is in bofh a row and 
a column; each row confains pofs from each of fhe columns, and vice versa. Rows and col- 
umns are fherefore considered as crossed blocking factors in fhis confexf. 

Several commonly used designs incorporate specific forms of blocking sfrucfure, and 
here we consider in defail fhree designs already infroduced in Chapfer 3: fhe Lafin square 
(Secfion 3.3.3), splif-plof (Secfion 3.3.4) and balanced incomplete block (Secfion 3.3.5) designs. 
For each we give a general descripfion, sfafe fhe underlying model, presenf fhe analysis of 
variance, describe fhe comparison of freafmenf means, and briefly discuss some common 
extensions fo or variafions on fhe basic design. We omif fhe mafhemafical expressions for 
some of fhe parameter esfimafes and sums of squares in fhis chapfer (fhe calculafions fol- 
low from fhe principles infroduced in Chapters 4, 7 and 8 for simpler designs), and instead 
emphasize fhe inferprefafion of resulfs produced by sfafisfical soffware. The Lafin square 
design, which uses fwo crossed blocking factors, is described firsf in Secfion 9.1, followed 
by fhe splif-plof design which, in ifs sfandard form, uses fhree nesfed blocking factors wifh 
fwo crossed freafmenf factors applied fo differenf levels of experimenfal imif (Secfion 9.2). 
Finally, defails are given for fhe balanced incomplefe block design, a useful varianf of fhe 
RCBD when fhe block size is smaller fhan fhe number of freafmenfs (Secfion 9.3). 



9.1 The Latin Square Design 

The Lafin square (LS) design was infroduced in Secfion 3.3.3 and is used where heferogene- 
ify is associafed wifh fwo crossed blocking factors, bofh wifh fhe same numbers of levels. 
This design was originally used for field experimenfs wifh plofs laid ouf on a square grid. 
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with heterogeneity expected across both rows and columns of the grid. The blocking fac- 
tors are therefore offen referred fo as rows and columns and we use fhese generic ferms in 
fhis secfion. However, fhe blocking factors may correspond fo any fwo crossed sources of 
heferogeneify, such as fime of day or observer. Some common sifuafions where a crossed 
blocking sfrucfure may be appropriate, and where a LS design could be used, include fhe 
following: 

• Field experimenfs laid ouf wifh plofs on a square grid, wifh bofh rows and col- 
umns of fhe grid expecfed fo confribufe fo heferogeneify befween plofs. Facfors 
influencing fhe heferogeneify could include soil characferisfics (e.g. ferfilify or 
posifion on a slope), managemenf pracfices, or fhe (pofenfial) direcfion of influx of 
pesfs and diseases. 

• Experimenfs in a glasshouse, confrolled environmenf (CE) room or growfh cabinef 
where fhe posifioning of benches, shelves and so forfh, wifh respecf fo walls, doors 
or lighf sources may infroduce sysfemafic variabilify (e.g. relafed fo femperafure, 
humidify or lighf) in differenf direcfions, for example, from leff fo righf and from 
back fo fronf (or, possibly, fop fo boffom). 

• Laborafory experimenfs where fhere are fwo pofenfial sources of variabilify, for 
example, scienfisfs and machines (see Example 3.3), and we are concerned abouf 
fhe impacfs of variation from fhe fwo sources. 

The LS design is fhe simplesf crossed blocking design suifable for such sifuafions, and 
is a special case of a more general class known as row-column designs (see Secfion 9.1.5). 
Eor a LS design, fhe number of rows and columns (i.e. fhe number of levels of each block- 
ing facfor) musf equal fhe number of freafmenfs and also fhe number of replicates of each 
freafmenf; we denofe fhis number by t. The freafmenf allocation is such fhaf each freafmenf 
appears exacfly once in each row and once in each column, wifh each row and each column 
containing the complete set of freatmenfs. Esfimafes of freafmenf effecfs are fhen indepen- 
denf of differences befween eifher rows or columns, and fhe row, column and freafmenf 
facfors are mufually orfhogonal (see Secfion 11.1). Overall, fhere are a fofal oiN=txt=f 
experimenfal unifs, and each freafmenf is replicated exacfly t times. The freafmenf struc- 
fure associated wifh a LS design may comprise a single facfor (wifh t levels), or any struc- 
fure wifh a fofal of t freafmenf combinations (see Chapter 8). 



EXAMPLE 9.1A: LUPIN TRIAL 

In Example 3.6, we introduced an experiment devised to investigate the effects of soil 
type and water availability on the growth of individual lupin plants in pots. Because 
of potential systematic trends due to temperature and light, the rows and columns of 
the square array of pots were considered as crossed blocking factors using a LS design. 
The treatments corresponded to a 2 x 2 factorial structure with two soil types (factor 
Soil; clay, C, or sand, S) combined with two levels of water supply (factor Water; low, 
L, or high, H). Initially, we consider a single set of four treatment combinations, coded 
in factor Treatment (with labels 1 = CFl, 2 = CL, 3 = SFI, 4 = SL). Plant heights (cm) were 
measured for each pot at the end of the experiment. The experimental layout and plant 
heights are shown in Table 9.1, with data held in file lupin.dat. 

For this experiment the number of treatments is t = 4 with a total of t x t = 16 experi- 
mental units (pots). It is easy to verify from Table 9.1 that each treatment combination 
is present once in each row and once in each column, such that each row (or column) 
contains all four treatment combinations. 
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TABLE 9.1 

Experimental Plan and Observed Response (Plant Heights, 
cm) for a LS Design with Two Treatment Factors: Soil Type 
(C = Clay, S = Sand) and Water Availability (L = Low, 

H = High) (Example 9.1A and File lupin.dat) 





Column 1 


Column 2 


Column 3 


Column 4 


Row 1 


CH 


SL 


CL 


SH 




19.6 


23.5 


21.7 


19.0 


Row 2 


CL 


SH 


CH 


SL 




15.5 


22.4 


23.2 


19.3 


Row 3 


SH 


CH 


SL 


CL 




18.5 


23.5 


26.4 


19.0 


Row 4 


SL 


CL 


SH 


CH 




19.8 


19.8 


23.9 


20.8 



Source: Data from I. Shield, Rothamsted Research. 



9.1.1 Defining the Model 

A model for observations from a LS design with a single treatment factor with t levels 
takes the form 



yijk - h + T + Cj + Ik + Cjk , (9.1) 

where is the observed response for the kth treatment in the ith row and yth column, p 
the overall population mean, r, the effect of the ith row, Cj the effect of the;th column, the 
effect of the kth treatment and e^ji^ the deviation associated with that observation. Note that 
the treatment allocated to unit ij (in the ith row and jth column) is actually determined by 
the randomization of treatments to units, so in theory we do not need another subscript 
to indicate the treatment applied. However, this would make the notation more complex 
and so for simplicity we use the extra subscript. All of the subscripts i, j and k run from 1 
to t but because we have only C units, not all of the combinations are present. For example, 
for each ij combination, only one value of k, corresponding to the treatment applied to that 
unit, will be valid. We use sum-to-zero constraints such that = 0, SyCy = 0 and = 0. 
This model can be written in our symbolic notation as 

Explanatory component: [1] + Treatment 

Structural component: Row*Column 

= Row + Column + Row.Column 

where factor Treatment labels the treatments, factor Row labels the level of the first (row) 
blocking factor and factor Column labels the level of the second (column) blocking fac- 
tor present on each unit. The Row.Column term labels the individual observations and 
corresponds to the model deviations. In practice, the single treatment term will often be 
partitioned to investigate crossed or nested structures in terms of underlying factors as 
described in Chapter 8. 



9.1.2 Estimating the Model Parameters 

The parameters associated with the model in Equation 9.1 are the overall population mean, 
p, the treatment effects 1;^/ ^ = 1 ■ ■ ■ h arid the row and column effects, r, and p for i,j = l ... t. 
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The effects are estimated with the principle of least squares (see Section 4.2), and they take 
the same general form, i.e. sample means for treatments adjusted by the sample grand 
mean, as in the CRD and RCBD. The orthogonality of the design means that the effects 
of treatment, row and column are independent of one another. As previously, the overall 
population mean, p, is estimated by the sample grand mean, 

h = y ■ 

The effect of the fcth treatment is then estimated by the difference between the sample 
mean for that treatment and the sample grand mean. 



y -k y / 



although here we take means only over the combinations of i and j present for each treat- 
ment. Similarly, the effect of the zth row {jth column) is estimated by the difference between 
the sample mean for that row (column) and the sample grand mean, 

f, = y - y ; C = y.,. - y . 

Again, in these equations, we take means only across combinations of subscripts that are 
present in the design. 

The quantity pj. = p + represents the population mean for the kth treatment. The best 
estimate of this population mean is then 

Pfc = P + Xfc = y + (y..j:-y) = y..k , 

i.e. the sample mean for the fcth treatment. 

EXAMPLE 9.1B: LUPIN TRIAL 

The model of Equation 9.1 applies to this experiment and can be written with a single 
set of treatments in symbolic form as 

Response variable; Height 

Explanatory component: [1] + Treatment 

Structural component: Row*Column 

The sample grand mean for the lupin trial is 20.99. Parameter estimates derived from the 
sample means are listed in Table 9.2. 



9.1.3 Assessing the Importance of Individual Model Terms 

Like the RCBD (Section 4.3.1), the LS is an orthogonal design, so it is possible to uniquely 
partition the total sum of squares of the observations (TotSS) into components due to the 
different sources of variation: here, rows (RowSS), columns (ColSS), treatments (TrtSS) and 
background variation (ResSS), so that 



TotSS = RowSS + ColSS + TrtSS + ResSS . 
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TABLE 9.2 

Parameter Estimates (Row, Column and Treatment 
Effects) for fhe Lupin Trial (Example 9.1B) 





Rows 




Columns 




Treatments 


h 


-0.04 


Cl 


-2.64 


Xi 


0.78 


h 


-0.89 


Cl 


1.31 


X2 


-1.99 


h 


0.86 


C3 


2.81 


X3 


-0.04 


h 


0.08 


Ci 


-1.47 


X4 


1.26 



These sums of squares can be written in terms of the parameter estimates, as 

t t t 

RowSS = t^^rl) ColSS = t^^cf; TrtSS = ResSS = , 

!=1 ;=1 k=l i,j,k 

where the summation for ResSS is made over combinations of i, j and k that are present in 
the design; this can be achieved by summation over any two of the indices. There is a cor- 
responding partition of the total degrees of freedom as 

TotDF = RowDF + ColDF + TrtDF + ResDF . 

The total number of df are computed as 



TotDF = N- l = f^-l, 

with the same number of df for each of the row, column and treatment terms, i.e. 

RowDF = ColDF = TrtDF = f - 1 . 

The residual df (ResDF) are most easily obtained by subtraction as 

ResDF = (f^ - 1) - (f - 1) - (f - 1) - (f - 1) = - 3t + 2 = (f - 1) X (f - 2) . 

As usual, we calculate the mean square for each term by division of its sum of squares by 
its degrees of freedom (Section 4.3). If any of the sets of row, column or treatment effects 
are uniformly zero, then the corresponding mean squares are attributable solely to back- 
ground variation. The variance ratios required to test null hypotheses of zero effects are 
therefore calculated as the appropriate mean square divided by the residual mean square, 
and these variance ratios are compared with the percentiles of an F-distribution with t - 1 
numerator and (t - 1) x (t - 2) denominator df. 

To construct a multi-stratum ANOVA table, we need to recognize the strata in this 
design (see Section 7.5 for an introduction to this concept). The LS design has three strata, 
corresponding to rows (factor Row), columns (factor Column) and the individual units 
(Row.Column). The multi-stratum ANOVA table shown in Table 9.3 is partitioned accord- 
ing to this structure. Because the complete set of treatments appears in each row and in 
each column of the design, there is no information on treatment effects in comparisons 
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TABLE 9.3 

Structure of the Multi-Stratum ANOVA Table for a LS Design with t Rows (Factor Row), 
Columns (Factor Column) and Treatments (Factor Treatment), and a Total of N = F Units 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


Row stratum 
Residual 


t-1 


RowSS 


RowMS = RowSS/(t - 1) 


RowMS/ ResMS 


Column stratum 
Residual 


t-1 


ColSS 


ColMS = ColSS/(f-l) 


ColMS/ResMS 


Row.Column stratum 
Treatment 


t-1 


TrtSS 


TrtMS = TrtSS/ (t - 1) 


TrtMS/ResMS 


Residual 


(t-1) (t-2) 


ResSS 


ResMS = ResSS/ 




Total 


N-1 


TotSS 


[(t-1) (t-2)[ 





between either rows or columns, and hence within either the Row or Column strata. As 
treatments are applied to the units defined by the combinations of rows and columns, 
variation in the Row.Column stratum is partitioned into the variation associated with the 
Treatment factor and residual variation. 

Recall that, in addition to testing hypotheses about treatment effects, we can also use the 
multi-stratum ANOVA table to test hypotheses about terms in the structural component. 
Units within a higher-level stratum can always be constructed from units at some lower level 
(e.g. rows consist of a set of plots). If there is no heterogeneity at the higher level, then the 
ratio of the residual variances for these strata, as estimated by their mean squares, should be 
close to unity. In the LS design, we can compare variation between rows with background 
variation using the variance ratio RowMS/ResMS (Table 9.3). Under the null hypothesis that 
the row effects are all zero, this variance ratio is distributed as an F-distribution with t - 1 
numerator and (t - 1) x (t - 2) denominator df. An analogous test, based on ColMS/ResMS, 
can be made for column effects. Recall that these tests are made to give information on 
the major sources of variation present in the structure, and can give information useful in 
designing future experiments; they are not used to refine the predictive model 

EXAMPLE 9.1C: LUPIN TRIAL 

The multi-stratum ANOVA table for this model is shown in Table 9.4. 

The residual plots (not shown) indicate no obvious violations of the assumptions 
(Section 5.2). The variance ratio for the treatment mean square gives strong evidence of 
differences between treatments (Fj^g = 12.667, P = 0.005). 

There is no evidence of differences between rows (corresponding to an expected light 
gradient, F||j = 3.165, P = 0.107), but there is strong evidence of differences between 
columns (corresponding to an expected temperature gradient, Fs^e, = 38.478, P < 0.001). 

This suggests that columns are likely to be an important source of structural variation 
for any future experiments in this environment. Although row variation is not signifi- 
cant for these data, it should still be allowed for in future experiments if previous expe- 
rience suggests that it is sometimes substantial. 

This analysis ignores the underlying treatment structure, and a more appropriate 
analysis uses a two-way crossed explanatory structure (Section 8.2) in terms of the 
underlying factors. Soil and Water, written as 

Explanatory component: [1] + SoIPWater 

= [1] + Soil + Water + Soil.Water 
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TABLE 9.4 

Multi-Stratum ANOVA Table for the Lupin Trial with Four Rows, Columns and 
Treatments (Factors Row, Column and Treatment, Respectively) (Example 9.1C) 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Row stratum 
Residual 


3 


6.162 


2.054 


pR = 3.165 


0.107 


Column stratum 
Residual 


3 


74.912 


24.971 


= 38.478 


< 0.001 


Row.Column stratum 
Treatment 


3 


24.662 


8.221 


F = 12.667 


0.005 


Residual 

Total 


6 

15 


3.894 

109.629 


0.649 







The treatment sum of squares can then be partitioned into components for the two 
main effects and the interaction. The resulting ANOVA table is shown in Table 9.5. The 
sum of the main effect and interaction sums of squares is equal to the combined TrtSS 
in Table 9.4. 

Starting at the bottom of the ANOVA table, there is strong evidence of a treatment 
interaction (F®;™ = 25.588, P = 0.002), indicating that the effect of soil type depends 
on the amount of water supplied. The presence of this interaction means that predic- 
tions should be based on all model terms, and that main effects might not be easily 
interpreted. Nevertheless, growth appears to differ between soil types (F®j = 9.062, 
P = 0.024), with no overall effect of water supply (Fj™ = 3.352, P = 0.117). 



9.1.4 Evaluating the Response to Treatments: Predictions from the Fitted Model 

For an unstructured set of treatments, the best estimate of the population mean for the kth 
treatment is the treatment sample mean, i.e. Ar = y-k (Section 9.1.2). Uncertainty associated 
with this estimate is measured by its estimated SE, 

SE{p,)=^ , 



TABLE 9.5 

Multi-Stratum ANOVA Table for the Lupin Trial Using the Two-Way Crossed 
Explanatory Structure SoiLWater (Example 9.1C) 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Row stratum 












Residual 


3 


6.162 


2.054 


pR = 3.165 


0.107 


Column stratum 












Residual 


3 


74.912 


24.971 


F° = 38.478 


< 0.001 


Row.Column stratum 










Soil 


1 


5.881 


5.881 


ps = 9.062 


0.024 


Water 


1 


2.176 


2.176 


F" = 3.352 


0.117 


Soil.Water 


1 


16.606 


16.606 


psw = 25.588 


0.002 


Residual 


6 


3.894 


0.649 






Total 


15 


109.629 
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using the estimate of background variation, s^ = ResMS, in place of fhe unknown frue value. 
This can be used fo form a 100(1 - aj)% Cl for fhe freafmenf populafion mean as 

j^t[“i/)(f_2)X SE(|1(;)J, |l;;+ f[“l/)(,L2)X SE(|l(;)jj , 

where f(“_"/)(,L 2 ) is fhe 100(1 - as/2)fh percenfile of fhe Sfudenf's f-disfribufion wifh 
(f - 1) X (f - 2) df. 

The besf esfimafe of fhe difference befween populafion means for fhe fcfh and sfh freaf- 
menfs is provided by fhe difference befween fheir respecfive sample means, i.e. 



M-fc i-^s y**fc y«s / 



and the estimate of the standard error of this difference is 



SED = SE(|ij-|lj = 




wifh corresponding LSD = SED x f{“iy(J_ 2 ). As usual, fhe sfafisfic 



^ ^ Ar - As 

SED 

has a f-disfribufion wifh degrees of freedom equal fo fhe ResDE, here (f - 1) x (f - 2), and 
can be used fo evaluafe fhe null hypofhesis of equalify of fhe fwo freafmenf populafion 
means againsf a fwo-sided alfernafive hypofhesis. The corresponding 100(1 - aj% confi- 
dence inferval for fhis freafmenf difference can be compufed as 

((A/t - As)-LSD, (Ar - As)+LSD). 



Predicfions for crossed or nesfed sfrucfures wifhin fhe sef of freafmenfs can be derived 
from fhe individual predicfions as described in Chapfer 8. 



EXAMPLE 9.1D: LUPIN TRIAL 

Table 9.6 shows predicted population means for each treatment combination. As the 
interaction term is statistically significant, this table is the most appropriate summary 
of this experiment (see Section 8.2.4), and the predictions and their SE are the same as 
would be obtained from use of the combined factor Treatment. The SE for the individual 
predictions is equal to 0.403, calculated using the ResMS = = 0.649 from Table 9.5 as 



^(AO 




0.649 



0.403 . 



The ResDE is (f - 1) x (t - 2) = 3 x 2 = 6 df. The SED for comparisons between pairs of 
individual treatments is V(2 x 0.649/4) = 0.570, with a 5% (two-sided) LSD calculated as 
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TABLE 9.6 

Predicted Population Means (SE = 0.403, 
SED = 0.570 on 6 df) for the Lupin Trial 
(Example 9.1D) 





Water Availability 


Soil Type 


H L 


C 


21.77 19.00 


s 


20.95 22.25 



LSD = X SED = 2.447 x 0.570 = 1.394 . 



In combination with the LSD, the predictions in Table 9.6 indicate that there is no real 
difference in growth in sandy soil (Soil = S) across the two levels of water availability, 
but that growth is significantly reduced in clay soil (Soil = C) when less water is available 
(Water = L). 



9.1.5 Constraints and Extensions of the Latin Square Design 

The LS design has several serious disadvantages that make its use impractical in many cir- 
cumstances. The numbers of rows, columns and treatments must be equal, but in practice 
it might not be possible to construct realistic blocks of the required size in both dimensions 
(rows and columns). In addition, when the number of treatments is small, the ResDF are 
also small. For example, if the number of treatments is three, four or five we have two, six 
and 12 ResDF, respectively. This means that the estimate of background variability is likely 
to be poor and the power to detect real treatment differences will be reduced (see Chapter 
10 for more discussion about statistical power). Flowever, there are various extensions of 
LS designs that ease these restrictions, and we discuss some briefly here. 

When the number of treatments is small, one way to increase the ResDF is to use mul- 
tiple squares of the same size, with each square having a different randomization. The 
squares may either be considered as independent, with separate rows and columns, or as 
linked, with rows or columns shared across the squares to form a Latin rectangle design. 
The advantage of linked squares is that common effects can be used for the shared rows 
or columns, which further increases ResDF, as demonstrated in the following examples. 



EXAMPLE 9.2: INDEPENDENT LATIN SQUARES 

An experiment was set up to investigate the effect of petal colour on the influx of pollen 
beetles into a crop of oilseed rape. Five different shades of petal colour were considered, 
and a LS design was used to account for the unknown direction of migration into the 
crop. Previous studies had found much spatial variation in beetle counts, and so two 
replicates of the LS design were used to increase the precision of treatment compari- 
sons. The two squares had the same orientation in adjacent fields, but common row or 
column effects could not reasonably be expected, and so the squares were regarded as 
independent. The experimental plan is shown in Table 9.7a. The structural component 
of the model takes the form 

Structural component: Field/(Row*Column) 

= Field + Field. Row + Field.Column + Field. Row.Column 
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TABLE 9.7 

Designs using (a) Two Independent LSs (Example 9.2) with Separate Rows and 
Columns in Different Fields; (b) Two Linked LSs (Example 9.3) with Position within 
Stacks Considered as Common across Replicates 



(a) (b) 

Field 1 Column Rep 1 Position within stack 





1 


2 


3 


4 


5 




1 


2 


3 


4 


5 


1 


E 


C 


A 


D 


B 


1 


E 


D 


A 


B 


C 


2 


A 


D 


B 


E 


C 


2 


D 


C 


E 


A 


B 


Row 3 


B 


E 


C 


A 


D 


Stack 3 


A 


E 


B 


C 


D 


4 


C 


A 


D 


B 


E 


4 


B 


A 


C 


D 


E 


5 


D 


B 


E 


C 


A 


5 


C 


B 


D 


E 


A 


Field 2 




Column 




Rep 2 


Position within stack 




1 


2 


3 


4 


5 




1 


2 


3 


4 


5 


1 


E 


C 


D 


B 


A 


1 


A 


C 


E 


D 


B 


2 


A 


D 


E 


C 


B 


2 


D 


A 


C 


B 


E 


Row 3 


D 


B 


C 


A 


E 


Stack 3 


E 


B 


D 


C 


A 


4 


C 


A 


B 


E 


D 


4 


C 


E 


B 


A 


D 


5 


B 


E 


A 


D 


C 


5 


B 


D 


A 


E 


C 



A dummy multi-stratum ANOVA table, showing the sources of variation with their df, 
is presented in Table 9.8a. There are four sfrafa, corresponding fo the two fields (Field), 
rows and columns within fields (Field. Row and Field. Column) and the individual plots 
within fields (deviations, indexed by Field. Row.Column combinations). The ResDF is 
now 28 compared with only 12 for a single 5x5 square. 

EXAMPLE 9.3: LINKED LATIN SQUARES 

An experimenf was required fo investigate the growth of different strains of fungus on 
a new subsfrate. Five strains of the fungus were available, each applied to 10 dishes. The 

TABLE 9.8 

Dummy Multi-Stratum ANOVA Tables for Two Replicates of a 5 x 5 LS with 
(a) Independent Squares (Example 9.2) in Different Fields and (b) Linked 
Squares Using Common Position Effects within Stacks (Example 9.3) 



(a) 

Source of Variation 


df 


(b) 

Source of Variation 


df 


Field stratum 




Rep stratum 




Residual 


1 


Residual 


1 


Field. Row stratum 




Position stratum 




Residual 


8 


Residual 


4 


Field. Column stratum 




Rep.Stack stratum 




Residual 


8 


Residual 


8 


Field. Row.Column stratum 




Rep.Stack.Position stratum 




Treatment 


4 


Treatment 


4 


Residual 


28 


Residual 


32 


Total 


49 


Total 


49 
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dishes were to be held in vertical stacks of five dishes within a CE cabinet. The investi- 
gator expected that position within each stack would affect growth rates, as well as the 
location of each stack on the shelf. Stack and Position were two independent sources of 
heterogeneity, and so the experiment was designed as two replicates of a 5 x 5 LS, with 
both replicates placed on the same shelf. The experimental plan is set out in Table 9.7b. 

Since the effect of position within stack was expected to be the same across all stacks, 
the two squares were considered linked, with position effects held in common. The 
structural component takes the form 

Sfructural component: Rep + Position + Rep. Stack + Rep. Stack. Position 

A dummy multi-stratum ANOVA table is Table 9.8b. This table also has four strata, 
corresponding to the two replicates (Rep), stacks within replicates (Rep. Stack), posi- 
tion within stack (Position), and the individual dishes (indexed by Rep. Stack. Position 
combinations). As position effects are held in common across squares (replicates), 
fewer effects are fitted than would be the case if these effects were expected to differ 
across squares (replicates), and the additional df are passed into the ResDF in the low- 
est stratum; now the ResDF equals 32 compared with 28 for the independent squares 
of Example 9.2. 

There are some situations in which we can extend the constraints provided by a LS design 
to provide real benefits in controlling the impacts of adjacent treatments. For example, in 
insect pheromone trials, neighbouring treatments can interfere because of movement of 
fhe pheromone plumes by wind. Neighbour-balanced LS designs, also known as com- 
plete or quasi-complete LS designs, are useful in such sifuations where rows and columns 
reflect the physical layout of fhe experiment. By balancing the occurrence of neighbouring 
pairs of treafments (so that each treatment occurs adjacent to each other treatment the 
same number of times within both rows and columns), these designs ensure that no indi- 
vidual treatment has an unfair advantage (or disadvantage) over others due to a lucky (or 
unlucky) allocation of neighbours. 

EXAMPLE 9.4: NEIGHBOUR-BALANCED LATIN SQUARE 

A design was required to investigate strategies for pest control on a crop of field beans. 

The treatments were two semio-chemicals with repellent qualities, three field margin 
mixtures as a trap crop, and an untreated control. If one uses small plots, there is a dan- 
ger of interaction (or contamination) between neighbouring treatments. For example, if 
the semio-chemicals are very effective, they might also repel pests from neighbouring 
plots. Conversely, a large pest population on the untreated control plots might start 
moving into neighbouring plots. A neighbour-balanced LS was used to even out any 
such interference. The experimental plan is set out in Table 9.9. It is straightforward to 
verify that each pair of treatments occurs as neighbours twice within rows and twice 
within columns. 

Another design extending the constraints of the LS is the Graeco-Latin (or Euler) 
square design, which is an orthogonal combination of two LS designs. This design allows 
the independent assessment of the effects of two f-level treatment factors, with each of the 
C treatment combinations occurring once within a single Graeco-Latin square. The main 
effects of the two f-level treatment factors can be estimated but, as there is no replication 
of the individual treatment combinations, it is not possible to test for the presence of a 
treatment interaction. This design should therefore be used only when prior knowledge 
suggests that no interaction will occur. A common use of these designs is for perennial 
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TABLE 9.9 

Neighbour-Balanced LS Design for Six Treatments: Control (C), Three Margin Mixtures (Ml, M2, 
M3) and Two Repellent Semio-Chemicals (SI, S2) (Example 9.4) 



Row 






Column 








1 


2 


3 


4 


5 


6 


1 


S2 


SI 


M3 


Ml 


M2 


C 


2 


SI 


Ml 


S2 


C 


M3 


M2 


3 


M3 


S2 


M2 


SI 


C 


Ml 


4 


Ml 


C 


SI 


M2 


S2 


M3 


5 


M2 


M3 


C 


S2 


Ml 


SI 


6 


C 


M2 


Ml 


M3 


SI 


S2 



crops, where different treatments may be applied in consecutive years, with the possibility 
of carry-over effects from treatments applied in previous years, but where we expect no 
interaction between previous and current treatments. These designs can also be adapted 
to situations where there are three (crossed) blocking factors and one treatment factor, all 
with t levels. Graeco-Latin square designs are even more restrictive than LS designs, and 
have even fewer residual degrees of freedom - for example, a single 4x4 Graeco-Latin 
square design has only 3 ResDF. Note that for some values of t (e.g. 6 and 10) it is not pos- 
sible to construct Graeco-Latin square designs. 

The LS design is a particular example of a row-column design, a class that includes 
more general designs with two crossed blocking factors. General row-column designs 
need not have equal numbers of treatments, rows and columns, and usually do not have an 
orthogonal structure. The simplest modification to a LS design might involve the removal 
of just a single row (or column) to produce an incomplete LS design, whilst the addition 
of a single row or column produces an extended LS design. Whilst these designs with 
one row or column deleted (or added) are no longer orthogonal, they are still balanced 
for treatment comparisons, and so can be analysed by standard multi-stratum ANOVA 
algorithms (see Section 11.6). As the discrepancy between the number of treatments and 
the number of replicates grows, however, it can be more challenging to find balanced row- 
column designs. Further details on the construction and analysis of general row-column 
designs can be found in Mead et al. (2012, Chapter 8). 



9.2 The Split-Plot Design 

In some experimental situations, the natural scale of experimental unit varies between dif- 
ferent treatment factors - examples include the following: 

• Field experiments in which machinery constraints apply, for example, irrigation 
treatments often have to be set up for a large area, but varieties can be sown on 
much smaller plots. 

• CE experiments where different regimes of temperature or lighting must be 
applied to whole rooms (or cabinets) whilst levels of other factors, such as plant 
variety or watering, can be applied within rooms (e.g. to plants in pots). 
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One approach in these situations would be to apply all treatment factors at the coarser 
scale, but this quickly leads to experiments requiring substantially more resources than 
are usually available. A better alternative is the split-plot (SP) design, introduced in Section 
3.3.4, which is appropriate for factorial experimenfs where levels of one or more facfors 
musf be applied to larger experimenfal unifs while fhe levels of fhe ofher facfor(s) can be 
applied to smaller unifs. If is worfh nofing, however, fhaf a SP design should be used only 
when real consfrainfs on fhe scale of freafmenf applicafion are presenf, as fhe design is less 
efficienf fhan fhe corresponding RCBD based on fhe same number of experimenfal unifs. 
SP designs occasionally arise as a means of modifying an exisfing experimenf to enable 
fhe addifion of a new freafmenf factor. 

Similarly fo fhe LS, fhe SP design had ifs origins in field experimenfafion, buf if is much 
more widely applicable. We firsf consider one form of SP design, which we call fhe sfan- 
dard form, and discuss variafions later. The sfandard SP design uses fwo freafmenf facfors, 
A and B, wifh a factorial sfrucfure and a fhree-level nested sfrucfure for fhe experimenfal 
unifs. We assume fhaf factor A can be applied only fo large unifs buf fhaf factor B can be 
applied fo smaller unifs, and fhaf a crossed model (Secfion 8.2) is appropriafe for fhese 
facfors. The highesf level of fhe sfrucfure corresponds fo complefe replicates of fhe sef 
of freafmenfs, and we denote fhis level as blocks. Each block is fhen divided info several 
whole plofs (somefimes called main plofs), wifh levels of freafmenf factor A randomized fo 
fhe whole plofs separately wifhin each block (equivalenf fo fhe randomizafion of a single 
freafmenf facfor in a RCBD). Finally, each whole plof is divided info several subplofs, and 
fhe levels of facfor B are randomized onfo subplofs wifhin each whole plof (jusf as if we 
were considering fhe whole plofs as blocks in a RCBD for facfor B). Because fhe fwo freaf- 
menf facfors, A and B, are applied wifhin differenf sfrafa, fhe main effecfs of facfors A 
and B are assessed againsf differenf levels of background variafion (befween whole plofs 
and befween subplofs, respecfively, see Secfion 9.2.2) and hence esfimafed wifh differenf 
precision. 

In general nofafion, in fhe sfandard SP design fhere are t freafmenfs, formed from all 
facforial combinafions of fwo freafmenf facfors, A and B, where facfor A has levels and 
facfor B has fg levels, and f = x fg. The number of blocks is denoted m, and fhe number of 
whole plofs in each block musf be equal fo (one for each level of facfor A) giving a fofal 
of m X tp^ whole plofs. The number of subplofs per whole plof musf fhen be equal fo fg (one 
for each level of facfor B) giving a fofal of N = m x t/^x t^ subplofs. Each level of facfor A 
will be presenf on one whole plof in each of fhe blocks, and each level of facfor B will be 
presenf on one subplof wifhin each whole plof. The replicafion for each level of facfor A is 
m main plofs whilsf fhe replicafion for each level of facfor B is m x subplofs. Finally, fhe 
replicafion for each of fhe t individual freafmenf combinafions is m subplofs. 

EXAMPLE 9.5A: WEED COMPETITION EXPERIMENT 

A field experiment using a SP design to investigate the competitive effects of weeds, 
with and without irrigation, on the yield of winter wheat was introduced in Example 
3.7. The experiment used two irrigation regimes (non-irrigated or irrigated) in combi- 
nation with three different weed species: Alopecurus myosuroides (black-grass), Galium 
aparine (cleavers) and Stellaria media (chickweed), abbreviated to Am, Ga and Sm, respec- 
tively, and a negative control (no weeds). The SP design was used because the irrigation 
regimes could be applied only to larger areas of land. The experiment had four blocks 
(m = 4), with irrigation regimes applied to whole plots within each block (i.e. two whole 
plots per block, tp = 2), and different weed species were sown in subplots within the 
whole plots (i.e. four subplots per whole plot, tg = 4). The layout and data for this experi- 
ment are shown in Table 9.10. 
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TABLE 9.10 

SP Layout of the Weed Competition Experiment (Example 9.5A and Eile 
competition.dat) 



Block 1 



Block 2 



Block 3 



Block 4 



Whole plot 1 



Whole plot 2 



7.92 


Am 

3.62 


Sm 


Ga 


5.70 


4.49 


- 


Sm 


9.11 


6.77 


Ga 


Am 


7.59 


4.12 



8.02 


Ga 

5.72 




Sm 

4.91 


Ga 

2.20 




Am 

3.71 


7.16 


Sm 


Am 




- 


Am 




Ga 


Sm 


6.32 


3.19 




5.54 


1.97 




6.51 


6.65 


Am 


Ga 




Am 


Sm 




Ga 


Sm 


2.52 


4.70 




2.92 


6.64 




4.91 


5.78 


- 


Sm 




Ga 


- 




Am 


- 


7.05 


5.91 




6.90 


8.18 




2.73 


8.22 



Source: Data from P. Lutman, Rothamsted Research. 

Note: Whole plots are shaded grey (irrigated) or white (non-irrigated) and each whole plot 
contains four subplots to which weed species (Am, Sm, Ga or no weeds, -) are applied. 



For this experiment, the replication for each level of the irrigation factor is four (one 
whole plot in each block), and the replication for each of the four weed treatments is 
eight (one subplot in each whole plot in each block). Each individual treatment (combi- 
nations of irrigation regime and weed species) is replicated four times. 



9.2.1 Defining the Model 

For this standard SP design we have a nested structure with three strata (blocks, whole 
plots within blocks, and subplots within whole plots within blocks) and a crossed treat- 
ment structure with two factors. Ideally, we should label the whole plots and subplots for 
each observation according to the experimental plan, so as to maintain the distinction 
between the treatment and blocking structures. However, for simplicity of notation here 
we label the whole plots within blocks such that the ;th whole plot has the ;th level of treat- 
ment factor A applied, and label subplots within whole plots such that the kth subplot has 
the kth level of treatment factor B applied. The linear model for this design can then be 
written as 



ytjk — P + fc, + (Xj + IVij + pt -I- (0cP);it + Cijk , 



where y^^ is the observed response on the kth subplot within the ;th whole plot within 
the ith block. Parameter p represents the overall population mean, 1>, is the effect of the 
ith block, tty the effect of the jth level of treatment factor A, the effect associated with 
the yth whole plot located in the ith block, P;t the effect of the kth level of treatment factor 
B, (aP)yj. the interaction effect for the ;th and kth levels from treatment factors A and B, 
respectively, and the model deviation. The subscripts range over i = l ... m,j = l ... tj^ 
and k = 1 ... tg. Sum-to-zero constraints are applied as = Zj.P;^ = Z^(aP)y;t = ^k((^^)jk = 0/ 
and = Zyic,y = 0. Parameter estimation for a SP design again follows the principles of 
least-squares estimation. If we use the symbolic names Y for the response. Block to label 
the blocks, WholePlot to label the whole plots within blocks, and Subplot to label the 
subplots within whole plots within blocks, then this design can be represented in our 
symbolic notation as 
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Response variable: Y 

Explanatory component: [1] + A*B 

= [1] + A+B + A.B 

Structural component: Block/WholePlot/Subplot 

= Block + Block.WholePlot + Block.WholePlot.Subplot 

The term Block.WholePlot.Subplot defines the individual units and so corresponds to the 
model deviations, e,j^. 

9.2.2 Assessing the Importance of Individual Model Terms 

We omit formulae for the sums of squares for fhe SP design, buf instead concentrate on the 
general form and interprefafion of fhe mulfi-stratum ANOVA fable fhaf can be obfained 
from statistical software. The total sum of squares of fhe observations (TotSS) is parti- 
tioned into six components of variation: blocks (BlkSS), the main effect of treatment factor A 
(SS(A)), whole plots (WPtSS), the main effect of freafmenf factor B (SS(B)), fhe A.B interaction 
(SS(A.B)) and subplot or background variation (ResSS), giving the following relationship: 

TotSS = BlkSS + SS(A) + WPtSS + SS(B) + SS(A.B) + ResSS . 

As usual, there is a corresponding partition of the total degrees of freedom. An impor- 
tant aspect of the analysis of a SP design is the partitioning of treatmenf variation among 
strata; variation among main effects for treatment factor A, quantified by SS(A), must be 
compared with the background variation at the whole-plot level, represented by WPtSS. 
However, variation among main effects for treatment factor B and variation among the 
interaction effects, quantified by SS(B) and SS(A.B), respectively, must be compared with 
background variation at the subplot level, represented by ResSS. Within the ANOVA table, 
SS(A) therefore appears within the Block.WholePlot stratum, and SS(B) and SS(A.B) appear 
within the Block.WholePlot.Subplot stratum. The multi-stratum ANOVA table for this 
standard SP design is Table 9.11. As usual, each mean square is calculated by division of 
the corresponding sum of squares by its degrees of freedom. 

TABLE 9.11 

Structure of the Multi-Stratum ANOVA Table for a Standard SP Design with m Blocks (Factor 
Block), Whole Plots per Block (Factor WholePlot) and Subplots (Factor Subplot) per Whole Plot 
and a Total of N = m x x tg Units 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


Block stratum 


Residual 


m - 1 


BlkSS 


BlkMS 


BlkMS/WPtMS 


Block.WholePlot stratum 


A 


fA-1 


SS(A) 


MS(A) 


MS(A) /WPtMS 


Residual 


1 

X 

1 


WPtSS 


WPtMS 


WPtMS/ResMS 


Block.WholePlot.Subplot 

stratum 


B 


tg 1 


SS(B) 


MS(B) 


MS(B)/ResMS 


A.B 


(U-l)x(fB-l) 


SS(A.B) 


MS(A.B) 


MS(A.B)/ResMS 


Residual 

Total 


1 

X 

1 

X 1 


ResSS 

TotSS 


ResMS 





Note: Treatment factor A (U levels) is applied to whole plots within blocks, treatment factor B (tg levels) is applied 
to subplots within whole plots. 
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In addition to evaluating the variance ratios for treatment terms, we can consider the 
variation due to the experimental structure. The block mean square is compared with 
the whole-plot residual mean square (WPtMS), because if fhe block effecfs are all zero 
fhen variafion befween blocks arises from whole-plof variafion alone. Similarly, fhe 
whole-plof residual mean square is compared wifh fhe subplof residual mean square. 
Nofe fhaf fhe ResDF for fhe Block.WholePlot sfrafum will always be smaller fhan fhaf 
for fhe Block.WholePlot. Subplot stratum, so that the background variation in the low- 
est stratum is always estimated with greater precision. If the whole plots form natural 
blocks, so that units within the same whole plot are more similar than units in different 
whole plots, then we expect the Block.WholePlot stratum residual mean square to be 
larger than the Block.WholePlot. Subplot stratum residual mean square. These two facts 
together mean that the effects of treatment factor B and the A.B interaction are usually 
estimated with more precision than the effects of treatment factor A. 

EXAMPLE 9.5B: WEED COMPETITION EXPERIMENT 

To analyse the data from this experiment, we require factors to define the blocking struc- 
ture corresponding to the physical layout, here called Block, WholePlot and Subplot 
and the two treatment factors, here called Irrigation and Species. Note the distinction 
between the levels of WholePlot and Irrigation (and similarly Subplot and Species) 
as the blocking factors represent the full field plan labelled systematically (shown in 
Table 9.10) to which the treatment factors have been randomized. These factors can be 
found in file competition.dat, along with a variate called Grain containing the response 
(weight of grain at 85% dry matter in tonnes/hectare). The full model written in sym- 
bolic notation is 

Response variable; Grain 

Explanatory component: [1] + lrrigation*Species 

Structural component: Block/WholePlot/Subplot 

The multi-stratum ANOVA for these data is shown in Table 9.12. As expected, there are 
three strata in the ANOVA table, the top stratum (Block) contains no treatment informa- 
tion, the middle stratum (Block.WholePlot) comprises variation due to the main effect of 
Irrigation and the whole-plot residual, and the lowest stratum (Block. WholePlot. Subplot) 
contains variation due to the Species main effect, the variation due to the Irrigation. Species 
interaction and the subplot residual. The associated residual plots (not shown) indicate 
no major violations of the model assumptions, so we can evaluate the treatment terms, as 
usual working upwards from the bottom of the ANOVA table. The interaction is highly 
significant ( Fsjg = 5.582, P = 0.007) indicating that the effect of irrigation on competition 
varies among weed species. The presence of this interaction indicates that we cannot sim- 
plify our model for prediction, which needs to use all the explanatory terms, but out of 
interest we still examine the main effects to assess the relative importance of the differ- 
ent model terms. There is very strong evidence for overall differences in the competitive 
effects of the weed species (F|is = 109.726, P < 0.001). The size of this F-statistic suggests 
that the interaction may be relatively small compared to these main effects. Despite the 
low ResDF in the whole-plot stratum, there is some evidence for overall differences 
between irrigation regimes (FJ 3 = 9.480, P = 0.054). Patterns of response in the predic- 
tions are explored further in Example 9.5C. 

Examination of variation associated with the experimental structure indicates large 
variation between different whole plots (F 3 ® ™ = 5.751, P = 0.006) but little additional 
variation among blocks (F ®3 = 1.476, P = 0.378). These results suggest that spatial varia- 
tion within the field occurs at reasonably fine (i.e. whole-plot) scales. 
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TABLE 9.12 

Multi-Stratum ANOVA Table for Grain Weight from the Weed Competition Experiment 
(Example 9.5B) 



Source of Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Block stratum 












Residual 


3 


6.6473 


2.2158 


pB = 1.476 


0.378 


Block.WholePlot stratum 












Irrigation 


1 


14.2311 


14.2311 


F' = 9.480 


0.054 


Residual 


3 


4.5035 


1.5012 


pB" = 5.751 


0.006 


Block.WholePlot.Subplot 












stratum 












Species 


3 


85.9257 


28.6419 


ps = 109.726 


< 0.001 


Irrigation. Species 


3 


4.3714 


1.4571 


F' ® = 5.582 


0.007 


Residual 


18 


4.6986 


0.2610 






Total 


31 


120.3776 









Note: Two irrigation regimes (factor Irrigation) were applied to whole plots (factor WholePlot) 
within four blocks (factor Block), three weed species and a negative control (no weeds) (fac- 
tor Species) were applied to subplots (factor Subplot) within whole plots. 



9.2.3 Evaluating the Response to Treatments: Predictions from the Fitted Model 

In Section 8.2.4, we stated our policy for making predictions from a crossed structure 
with two factors, namely, that predictions are made from all model terms if the interaction 
term is significant, and that predictions are made from any significant main effects when 
the interaction is not significant. Predictions from the standard SP design take the same 
form (i.e. treatment sample means) as described in Section 8.2.4, but the SEs and SEDs for 
estimates of treatment effects made within the whole-plot stratum take a slightly different 
form. The predictions are summarized in Table 9.13, together with their standard errors 
and associated df. 

The estimated SE for predictions for factor B (averaged across all levels of factor A, 
denoted |l.t) are calculated from the ResMS, denoted as usual, and the associated df are 
the ResDE from the subplot (Block.WholePlot. Subplot) stratum (ResDE). The estimated SE 
for predictions for factor A (averaged across all levels of factor B, denoted p,.) take the same 



TABLE 9.13 



Eorm of Predicted Population Means for Combinations of Treatment Pactors Applied to Whole 
Plots (Pactor A) and Subplots (Factor B) in the Standard SP Design with Estimated SEs and 
Associated Df 



Description 


Population Mean 


Prediction 


SE of Prediction 


df for SE 


;th level of A 


P/. 


ii/- = y-/- 


yjsl / (fe X m) 


(tA-l)x(m-l) 


fcth level of B 


p.t 


A-t = y~k 


J sV (tA X m) 


hx(tB-l)x(m-l) 


;'th level of A with 
Icth level of B 


Pit 


Pjk = y.jk 


•J{sl +(4b -l)xsh/{tB X m) 


Equation 9.2 
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form but are calculated from the residual mean square of the whole-plot (Block.WholePlot) 
stratum, estimated as Sw = WPtMS, with replication expressed in terms of the number of 
subplots present with each level of factor A (as m x fg). The df associated with these SE are 
the ResDF from the whole-plot stratum. The estimated SE for prediction of an individual 
treatment combination, denoted (ijk, takes a more complex form. These predictions are 
calculated as a mean of responses from subplots in different whole plots, and both the 
estimated whole-plot and subplot stratum variances, i.e. Sw and s^, contribute to the vari- 
ance of this mean. The stratum variances are combined, taking account of the number of 
subplots per whole plot, as (s^ + (fg -1 ) x s^)/tg. This must then be divided by the replica- 
tion of each treatment combination, m, before taking the square root to obtain the SE given 
in Table 9.13. The df for these SE take account of the contributions of the two stratum vari- 
ances, using Satterthwaite's formula (Satterthwaite, 1946), as 

.f ^ (m - l)[Sw + ih - l)s"]^ 
st/(tA-l) + (tB-l)sVtA 

This quantity lies between the smaller of the ResDF for the two strata, here (m - 1) x {tj^ - 1), 
and the sum of the ResDF for the two strata, here (m- l)x {tjJ:^ - 1). It approaches its mini- 
mum value when the whole-plot residual mean square is very much larger than the sub- 
plot residual mean square, and takes the maximum value when Sw = (Ia - l)sVtA/ i e. when 
the whole-plot residual mean square, WPtMS, is a specific proportion of the ResMS. These 
df will usually be non-integer: statistical software can calculate critical values for non- 
integer df, but if statistical tables are to be used, then the df should be rounded down to 
the nearest integer. Confidence intervals for predictions can be obtained from the SEs and 
their associated df in the usual manner. 

The difference between two population treatment means is as usual estimated by 
the difference in the two respective sample treatment means. Again, calculation of the 
estimated SEs for treatment comparisons (SEDs) is more complex where the comparison 
involves contributions from more than one stratum. Comparisons across individual 
treatment combinations with the same level of treatment factor A, for example, [ijk - P/s 
with k^s, are made entirely within the subplot stratum with the estimated SE calcu- 
lated as 



SE(|i p .J = V2sVm 



and associated df equal to the subplot ResDF, x (tg - 1) x (m - 1). Comparisons across 
different levels of treatment factor A, for example, p^/t- p^ with; r, involve contributions 
from different whole plots and have their estimated SE calculated as 

SE(p^.^- p^J = ^2[Sw -I- {h - 1) X s^]/(tB X m) , 

with associated df given by the Satterthwaite formula in Equation 9.2. This SE is valid 
whether the comparison is for the same level of treatment factor B (A: = s) or for differ- 
ent levels {k s). Comparisons of predictions for different levels of factor A averaged 
over all levels of factor B are made entirely within the whole-plot stratum, with esti- 
mated SE 
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SE(|1^..- |1^.) = ^2Sw/(iB X m) 

and associated df equal to the whole-plot ResDF, (t^ - 1) x (m - 1). Finally, comparisons 
of predictions for differenf levels of factor B averaged over all levels of facfor A are made 
enfirely wifhin fhe subplof sfrafum, wifh esfimafed SE 

h.J = V2sV(fA X m) 

and associated df equal to fhe ResDF, ^ {t^ -T) x(m- 1). As usual a f-sfafisfic, LSD or 
100(1 - as)% Cl associafed wifh fhe null hypofhesis of no difference befween fhe popula- 
fion means can be calculafed from fhe appropriafe SED and df. 



EXAMPLE 9.5C: WEED COMPETITION EXPERIMENT 

A statistically significant interaction between irrigation regime and weed species was 
found in the ANOVA table (Table 9.12) so the predictive model must use all of the 
explanatory terms, i.e. both main effects and the interaction. The predictions are listed 
in Table 9.14 and shown in Figure 9.1. 

For this experiment we have m = 4, tj^ = 2 and tg = 4, with = 1.5012 and = 0.2610 
(see Table 9.12). The estimated SEs for these predictions can therefore be calculated as 



^(h,J = 



Sw + (tfl ~ l)s^ 
tg X m 



1.5012 + (3 X 0.2610) 
V ' ^^4x1 



2.2843 

16 



0.3778 , 



with associated df calculated from Equation 9.2 as 

(w - l)[s^ + (tg - l)s"]" ^ 3 X [1.5012 + 3 X 0.2610]" ^ 3 x 2.2843" = 6 64df 
st/(fA -l) + (tB -l)s"/tA ^ 1.5012" + I X 0.2610" “ 2.3558 

SEDs between predictions for different species (labelled k and s) within the ;th irriga- 
tion regime are estimated as 



SE(jl .j- = V2s"/m = yjl x 0.2612/4 = 0.3613 , 



TABLE 9.14 



Predicted Grain Weight for All Combinations of 
Irrigation Regime and Weed Species, with Comparisons 
across Irrigation Regimes within Species (Example 9.5C) 



Species 


Irrigation 


Difference 
No - Yes 


No 


Yes 


- 


8.117 


7.182 


0.935 


Am 


3.485 


2.710 


0.775 


Ga 


6.680 


4.075 


2.605 


Sm 


6.595 


5.575 


1.020 



Note: Prediction SE = 0.3778 with 6.64 df. SED for comparisons 
within irrigation regime = 0.3613 on 18 df, SED for com- 
parisons across irrigation regimes = 0.5344 on 6.64 df. 
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FIGURE 9.1 

Predicted grain weight (t/ha) for each combination of species (x-axis; - = no weeds) and irrigation regime (o = no 
irrigation, • = irrigation) in the weed competition experiment (Example 9.5C). SED for comparisons within each 
irrigation regime = 0.361 on 18 df, SED for comparisons across irrigation regimes = 0.534 on 6.64 df. 



with 18 df, and SEDs for different irrigation regimes (labelled j and r) within the fcth 
species are estimated as 



2(1.5012 + 3 X 0.2610) ^ 1 2 x 2.284 ^ 

4x4 “ V 16 

with 6.64 df (from the calculation above). In this case, SEs for comparisons across irri- 
gation levels are substantially larger and considerably less precise than comparisons 
within each irrigation regime, because of relatively large variation between whole plots 
and the small ResDF at that level. 

To interpret the patterns of yield response, we are interested in comparing the effect of 
irrigation on the competitive effects of each species (four comparisons, shown in Table 
9.14), and in comparing grain yields with each species present against the control in 
the absence of irrigation (another three comparisons). To allow for the number of tests 
(seven), we use a Bonferroni correction to the significance level (Section 8.8.1) giving 
adjusted significance level aj = 0.05 / 7 = 0.007. The critical value of the t-distribution 
at significance level aj/2 for 6.64 df is 3.830 and for 18 df is 3.034. The LSD for differ- 
ences across species for no irrigation is then 1.0960 (= 3.034 x 0.3613) and the LSD for 
comparisons within species across irrigation regimes is 2.0457 (= 3.830 x 0.5344). It is 
clear that all of the weed species reduce grain yield in the absence of irrigation, with 
the reduction (competitive effect) being greatest for Am (black-grass). The application 
of irrigation further increases the competitive effects for Ga (cleavers) but has no real 
impact on yield in other cases. 



SE(P,.,- P„) 



2(Sw + (1b ~ l )s^) 
Jb X m 



9.2.4 Drawbacks and Variations of the Split-Plot Design 

The most important criterion that determines whether a SP design should be used is the 
existence of practical or operational limitations on experimental units to which the pro- 
posed treatments can be applied. If the same experimental unit can reasonably be used 
for all factors and suitable (homogeneous) blocks are available, then the RCBD is almost 
always a better design because all treatment comparisons then have the same precision. 
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The main drawback of the SP design is in the small number of ResDF available within the 
whole-plot stratum, which limits the precision of any comparisons made across different 
levels of treatments applied to whole plots. 

We have described one standard form of the SP design in which complete replicates 
of the design are arranged in separate blocks. There are several common variations on 
this design. For example, a SP structure may also be implemented without partitioning 
replicates into separate blocks. In this case, the design corresponds to a CRD at the whole- 
plot level, rather than a RCBD, with the whole plots split into subplots as in the standard 
design. This type of design may be useful if there is no evidence or expectation of hetero- 
geneity between whole plots assigned to different replicate blocks, as omitting the blocks 
increases the ResDF in the whole-plot stratum by m - 1 df. The model for this design is 
similar to that in Section 9.2.1 but excludes the block effect (&,), and the ANOVA table hence 
also omits the block stratum. We illustrate this situation in the following example. 



EXAMPLE 9.6: TREE SEEDLING GROWTH* 

The effects of temperature and soil substrate on growth of tree seedlings are of interest. 
A glasshouse containing six temperature-controlled beds, each of which can be set at 
only one temperature at a time, will be used for this experiment. Each bed can accom- 
modate two trays of plants, and substrates can be applied to individual trays. Previous 
experiments have shown no evidence of coarse-scale spatial heterogeneity within the 
glasshouse, hence a SP design without blocks is appropriate, with beds as whole plots 
and trays within beds as subplots. A randomized allocation of three replicates each 
of two temperatures (15°C and 20°C) is made onto the six beds. One tray in each bed 
contains plants growing in a vermiculite-based substrate and the other contains plants 
growing in a chipped-wood-based substrate, using a randomized allocation within 
beds. A schematic layout for this experiment is set out in Table 9.15. 

With an obvious nomenclature for factors, the form of the symbolic model is 

Explanatory component: [1] + Substrate*Temperature 

Structural component: Bed/Tray 

A dummy ANOVA table for this design is shown in Table 9.16. The Bed stratum has four 
ResDF here, which is very low, but a blocked form of this design would have only two 
ResDF at this level. This therefore seems a more sensible design if there is no previous 
evidence of heterogeneity across beds within the glasshouse. 



TABLE 9.15 



Layout of Tree Seedling Growth Trial (Example 9.6) 





Tray 1 


Tray 2 


Bed 1 


20°C/Chips 


20°C / Vermiculite 


Bed 2 


20°C / Vermiculite 


20°C/Chips 


Bed 3 


15°C/Chips 


15°C/Vermiculite 


Bed 4 


15°C / Vermiculite 


15°C/Chips 


Bed 5 


15°C/Chips 


1 5°C / Vermiculite 


Bed 6 


20°C/Chips 


20°C / Vermiculite 



Note: Two temperatures (15°C or 20°C) each randomly allocated to three 
beds and two substrates (chips or vermiculite) randomly allocated 
to trays in each bed. 
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TABLE 9.16 

Dummy Multi-Stratum ANOVA Table for Tree 
Seedling Growth Trial (Example 9.6) 



Source of Variation 


df 


Bed stratum 


Temperature 


1 


Residual 


4 


Bed. Tray stratum 


Substrate 


1 


Substrate.Temperature 


1 


Residual 


4 


Total 


11 



Note: Two temperatures (factor Temperature) were allo- 
cated randomly to beds and two substrates (fac- 
tor Substrate) randomly to trays within beds. 



Another common variant of the SP design occurs with CE cabinets (or rooms) or glass- 
house compartments where the whole-plot treatment is an environmental condition (e.g. 
temperature, humidity or CO 2 level). In this situation the cabinet is considered as the 
whole plot, but often the number of cabinefs available is limifed so fhaf differenf replicafe 
blocks comprise separafe runs in differenf fime periods using fhe same sef of cabinefs. 
To even ouf any bias associafed wifh individual cabinefs, a row-column design can be 
used for fhe allocafion of freafmenfs fo cabinefs across fhe runs (considering runs as 
fhe row blocking facfor and cabinet as the column blocking factor). Ideally, the appli- 
cation of whole-plot treatments is balanced across the cabinets as well as across runs. 
If fhe design is orfhogonal, for example, if a LS design is used af fhe whole-plof level, 
fhen variafion associafed wifh individual cabinefs and individual runs can be separafed 
from fhe remaining variafion befween whole plofs, pofenfially leading fo a more pre- 
cise evaluafion of fhe whole-plof freafmenf effecfs (if sufficient ResDF are present at that 
level). Example 9.7 illustrates this situation and shows the structure of fhe mulfi-sfrafum 
ANOVA fable. 

EXAMPLE 9.7: ENRICHED COj TRIAL* 

An experiment was conducted to investigate the variation in growth rates of variet- 
ies of spring wheat under ambient COj compared with two richer CO 2 conditions in 
a set of three growth cabinets. Within each cabinet, pots of six separate varieties were 
grown. As differences were expected to be small, each CO 2 treatment was repeated six 
times, and the limited number of growth cabinets meant that this replication could be 
implemented only by repeating the experiment over time. The allocation of CO 2 levels 
to cabinets was assigned as two linked replicates of a 3 x 3 LS design (a Latin rectangle). 

A schematic layout for this experiment is shown in Table 9.17. 

The structure of this experiment can be written as 

Explanatory component: [1] + C02*Variety 

Structural component: Square + Square. Run + Cabinet 

+ Square. Run. Cabinet + Square. Run. Cabinet. Pot 

where factor Square (two levels) labels the two repeats of the LS part of the design 
and runs (factor Run, three levels) are labelled sequentially within each Square. The 
same cabinets are used in both squares and are expected to have a consistent effect. 
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TABLE 9.17 



Design for Enriched CO 2 Trial, with COj Treatments (Ambient or Two Levels of 
Enrichment) Allocated to Cabinets as Two Replicates of a 3 x 3 LS (Example 9.7) 







Cabinet 1 


Cabinet 2 


Cabinet 3 




Run 1 


Enriched level 1 


Ambient CO 2 


Enriched level 2 


Square 1 


Run 2 


Enriched level 2 


Enriched level 1 


Ambient CO 2 




Run 3 


Ambient COj 


Enriched level 2 


Enriched level 1 




Run 1 


Enriched level 2 


Enriched level 1 


Ambient CO 2 


Square 2 


Run 2 


Enriched level 1 


Ambient CO 2 


Enriched level 2 




Run 3 


Ambient CO 2 


Enriched level 2 


Enriched level 1 



Note: Six varieties allocated to positions within each cabinet at random (not shown). 



In the terminology of a SP design, the cabinets within each run (Square. Run. Cabinet 
term) correspond to the whole plots, and pots within cabinets correspond to subplots 
(term Square. Run. Cabinet. Pot). A dummy ANOVA table for this structure is shown 
in Table 9.18. 

The CO 2 treatments were applied to the individual cabinets within each run of the 
experiment, and so the CO 2 sum of squares appears in the Square.Run. Cabinet stra- 
tum. Due to the repetition over time, this experiment has achieved eight ResDE in this 
stratum for testing the CO 2 main effects. The different varieties were applied to pots 
within cabinets; hence, the Variety and Variety.C02 sums of squares appear in the bot- 
tom stratum, which corresponds to the Square. Run. Cabinet. Pot combinations. 



TABLE 9.18 

Dummy Multi-Stratum ANOVA Table for Enriched CO 2 
Trial with Three CO 2 Treatments (Factor CO 2 , Applied to 
Cabinets) on Growth of Six Plant Varieties (Factor Variety, 
Applied to Pots) (Example 9.7) 



Source of Variation 


df 


Square stratum 


Residual 


1 


Square.Run stratum 


Residual 


4 


Cabinet stratum 


Residual 


2 


Square.Run. Cabinet stratum 


0 

p 


2 


Residual 


8 


Square.Run. Cabinet. Pot stratum 


Variety 


5 


Variety.C02 


10 


Residual 


75 


Total 


107 



Note: The experiment used two replicate LSs (factor Square) each 
with three rows (factor Run) and columns (factor Cabinet) 
and with six pots in each cabinet in each run (factor Pot). 
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A third common extension to the standard SP design involves subdivision of the sub- 
plots into even smaller units (i.e. sub-subplots). This design, known as a split-split-plot 
design, corresponds to a blocking structure with four nesfed sfrafa. In fhe simplesf exfen- 
sion, a fhird freafmenf factor is applied fo fhe sub-subplofs, and fhe explanatory compo- 
nenf becomes a fhree-way crossed sfrucfure. The analysis of dafa from such designs is a 
sfraighfforward extension of fhe mulfi-sfrafum ANOVA fable presented for SP designs in 
Secfion 9.2.2. In fheory, any number of divisions and corresponding exfra freafmenf facfors 
may be included. However, fhe drawbacks of fhe sfandard SP design are amplified as new 
levels of subdivision are added, and so fhis fype of design should be used only if fhe block- 
ing sfrucfure mafches real consfrainfs on fhe experimenfal procedure. In addifion, furfher 
complicafions arise from fhe calculafions of SEs and SEDs because of fhe addifional sfrafa. 
Mead ef al. (2012, Chapfer 18) discuss variafions on SP designs, including relafed designs 
such as fhe sfrip-plof or criss-cross design, in more defail. 

Pinally, mulfiple freafmenf facfors can be included in one or more of fhe sfrafa of a SP 
design. The analysis fhen involves parfifioning fhe freafmenf variafion befween main 
effecfs and inferacfions corresponding fo fhe underlying facfors, following principles 
inf reduced in Chapfer 8. 



9.3 The Balanced Incomplete Block Design 

In Secfion 3.3.5, fhe balanced incomplete block design (BIBD) was presented as a useful 
alfernafive fo fhe RCBD for a sifuafion in which fhe size of each homogeneous block is 
smaller fhan fhe number of freafmenfs. As for fhe RCBD, fhe blocking sfrucfure associafed 
wifh a BIBD consisfs of fwo nesfed sfrafa, wifh fhe sfrucfural componenf wriffen as 

Sfrucfural componenf: Block/Unit 

= Block + Block.Unit 

using fhe symbolic names Block fo label fhe blocks, and Unit fo label fhe experimenfal 
unifs wifhin blocks. 

In general, an incomplefe block design (IBD) is likely fo be useful when eifher fhe num- 
ber of freafmenfs is very large, or fhe block size has fo be very small. Some fypical exam- 
ples include 

• Variety trials. Offen many (> 100) variefies are grown wifhin fhe same field frial. If 
is usually nof possible fo locafe homogeneous blocks large enough fo confain all 
variefies. 

• Two-colour microarray experiments. In fhis experimenfal framework, fwo freaf- 
menfs - labelled wifh differenf dyes - can be applied fo a microarray slide simul- 
faneously. Where more fhan fwo freafmenfs are invesfigafed, fhe combinafions of 
freafmenfs applied fo each slide should be carefully chosen because direcf com- 
parisons wifhin fhe same slide are generally more reliable fhan indirecf compari- 
sons across differenf slides. 

There are many classes of incomplefe block design, buf here we concenfrafe on fhe class 
of BIBDs. Some new nofafion is required fo describe fhis type of design, and fhis is sum- 
marized in Table 9.19. 



Models with More Complex Blocking Structure 



233 



TABLE 9.19 



Summary of Notation for Balanced Incomplete Block Designs (BIBD) 



Symbol 



Description 



N 



A 



n 



m 



u 



Total number of experiment units {N = t xn = mxu) 

Number of treatments 

Number of replicates of each treatment 

Number of blocks 

Number of units per block 

Number of times each treatment pair occurs together within a block 



We still use t and n to denote the number of treatments and the number of replicates 
per treatment, respectively. The total number of experimental units is therefore equal 
to N = n X t. These units will be arranged in a total of m incomplete blocks each consist- 
ing of u experimental units, where u <t, so the block size is smaller than the number of 
treatments. The total number of experimental units can then alternatively be written as 
N =mxu. In a BIBD, each pair of treatments must occur together within blocks exactly 
the same number of times, denoted X. This condition ensures that all treatment compari- 
sons are evaluated with the same precision, as required by the definition of balanced 
designs. 

EXAMPLE 9.8: GRAIN PROTEIN CONTENT* 

In Example 3.8, we described an experiment to evaluate the grain protein content for 
six different varieties (f = 6), A to F, each with five replicates in = 5). Protein content 
was measured during six sessions (blocks, m = 6), with five samples processed in each 
session (m = 5). Each treatment was omitted from just one of the sessions as shown in 
Table 4.6. In this design, each pair of treatments appears in only four of the six blocks. 

For example, both of treatments C and E appear in the first four blocks, and both of 
treatments B and D in the last four. 

BIBDs cannot be constructed for every combination of treatment number, block size, 
and level of replication because of the requirement that all pairs of treatments must occur 
together within blocks the same number of times across the design. The following two 
relationships between the five design parameters must hold in order for a BIBD to exist 
(but do not guarantee that such a design does exist). 



The first relationship was introduced above. The second relationship concerns the num- 
ber of within-block comparisons for each treatment. The left-hand side calculates the 
number of comparisons for a given treatment in terms of the number of blocks in which 
it appears (the number of replicates, n) multiplied by the number of comparisons in 
each block (one less than the number of units per block, u - 1). The right-hand side 
calculates this quantity using the number of times each treatment pair occur together 
within blocks (K) multiplied by the number of other treatments (t - 1). Given the level of 
replication (n), number of treatments (t) and block size (m), the first relationship can be 



t X n = mx u , 
n X {u - 1) = {t - 1) xX . 
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used to identify the number of blocks as m = {n x t)/u. The second relafionship can be 
rearranged as 



^ n{u - 1) 

(f-1) ' 

fo indicafe fhe number of fimes each treafmenf pair occurs in a block fogether. As for all 
fhe ofher design paramefers, X must be a positive integer - any non-integer value indicates 
that a BIBD does not exist. In addition, Bailey (2008) quotes Fisher's inequality which states 
that, in a BIBD, the number of blocks musf be greafer than or equal to the number of freaf- 
ments {m > t). Finally, a BIBD is called resolvable if the blocks of fhe design can be grouped 
info sefs such thaf each sef confains one replicafe of each freafmenf. Obviously fhis can 
occur only when the number of freafments is exacfly divisible by fhe number of unifs per 
block (i.e. t/u is an infeger number). 

Tables of BlBDs are given in some texf books (e.g. Cochran and Cox, 1957; Fisher and 
Yates, 1963; Box et al., 1978), and some of these designs can be generated by statistical 
software. 

EXAMPLE 9.9A: DESIGNING A BIBD EXPERIMENT EOR SEVEN TREATMENTS* 

A scientist is interested in evaluating seven different treatments (t = 7), using blocks 
of size three or four iii = 3 or 4), with up to four replicates of each treatment (w < 4), 
giving a maximum of 28 units in total (N < 28). Because seven is a prime number, 
there are no resolvable BIBDs for this scenario. There are two possible BIBDs that fit 
these constraints, and both use seven blocks (the minimum possible number). The 
first design uses seven blocks of size three, 21 units in total, with three replicates 
of each treatment and each pair of treatments occurring together in just one of the 
blocks (m = t = 7, li = n = 3, X = 1, see Table 9.20a). The second design uses seven blocks 
of size four, 28 units in total, with four replicates of each treatment and each pair of 
treatments occurring together in exactly two of the blocks (m = t = 7, u = n = A, X = 2, 
see Table 9.20b). 

There is an obvious connection between these two designs: for each block in the first 
design, there is a corresponding block in the second design such that the pair of blocks 
contains the full set of treatments (e.g. block 1 in Table 9.20a and block 3 in Table 9.20b). 
Further information is required to make an informed decision on which design is more 



TABLE 9.20 



BIBDs for Seven Treatments in Seven Blocks with (a) Three Units per 
Block or (b) Four Units per Block (Example 9.9A) 





(a) 

Unit 1 


Unit 2 


Unit 3 


(b) 

Unitl 


Unit 2 


Unit 3 


Unit 4 


Block 1 


1 


5 


7 


4 


1 


2 


7 


Block 2 


6 


4 


7 


7 


6 


5 


2 


Block 3 


2 


5 


4 


3 


6 


2 


4 


Block 4 


2 


7 


3 


3 


4 


5 


7 


Block 5 


4 


3 


1 


6 


5 


1 


4 


Block 6 


5 


3 


6 


6 


3 


7 


1 


Block 7 


2 


6 


1 


3 


5 


1 


2 
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appropriate for the experiment (see Example 9.9B), though the greater replication will 
generally provide more power, provided that the larger block size still contains homog- 
enous units. 



9.3.1 Defining the Model 

The linear model associated with a BIBD takes the same form as for the RCBD (see Section 
7.1). For a single treatment factor, the model can be written as 

t/i; p. "t" it, "h Xj £ij / 

where i/,y represents the observation on the ;th treatment in the ith block, p the overall 
mean, the effect of the ith block, Xy the effect of the jth treatment and e,y the deviation for 
this observation. Again, ideally we should label units within plots according to the experi- 
mental layout to maintain the distinction between the blocking and treatment structures, 
but for simplicity we omit this distinction. The subscript i runs from 1 to m, and the sub- 
script j runs from 1 to t, note however that, because we have only N=txn = mxu units, 
not all combinations are present. In our usual symbolic notation, and with obvious defini- 
tions of the factors, the full description of the model is 

Response variable: / 

Explanatory component: [1] + Treatment 

Structural component: Block/Unit 

In this case the Block. Unit combinations label the full set of units and correspond to the 
model deviations. 

Complexities arise in analysis of the BIBD because the block and treatment factors are 
not orthogonal, as only a subset of the full treatment set is present in each block. The sim- 
plest analysis, based on comparisons between treatments within blocks, and hence called 
the within-block or intra-block analysis, estimates treatment effects after adjustment for 
(elimination of) block effects. These treatment estimates take the form 

T(w) = V-j ~^i) ’ 

where j/.y is the sample mean for the ;th treatment, and Bj the mean of all units in blocks 
that contain the jth treatment. For example, in the BIBD in Table 9.20a, treatment 3 occurs 
only in blocks 4, 5 and 6, so B3 would correspond to the average of all observations in those 
three blocks. Finally, EF is the efficiency factor, calculated as 



nxu 

The EF is the proportion of the information on treatment differences available from the 
within-block analysis, with 0 < EF < 1. The remainder of the information is available from 
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comparisons across blocks, called the between-block or inter-block analysis. When the 
EF is less than 1, the inter-block estimates of treatment effects take the form 






1 

(1 - EF) 






This estimafe involves fhe/fh freafmenf via Bj, described above. The use of these intra- and 
inter-block estimates is discussed in Section 9.3.2. 



9.3.2 Assessing the Importance of Individual Model Terms 

The ANOVA table for the BIBD contains information on treatments, and therefore a sum 
of squares for freafments, in both the Block stratum (the between-block treatment sum of 
squares, BTrSS, corresponding to the inter-block estimates) and in the Block. Unit stratum 
(the within-block treatment sum of squares, WTrSS, corresponding fo fhe infra-block esfi- 
mafes), see Table 9.21. 

Wifhin each sfrafum, f - 1 df are allocafed for estimation of freafmenf effecfs and the 
ResDF are calculated by subtraction from the TotDF (which equal m - 1 df for fhe Block 
strafum and N -mdf for the Block. Unit stratum). As usual, the mean squares are obtained 
by division of each sum of squares by ifs df. The freafmenf mean squares are compared 
with the residual mean squares from fhe sfrafa in which they occur, and the block mean 
square can be compared with the residual mean square from fhe Block. Unit strafum. 

The immediate question on construction of fhis ANOVA table is: How do we reconcile 
the two separate variance ratios for freafments? If fhe EF is large (close fo 1), then most 
of the treatment information lies within blocks. Since variation within blocks tends to be 
smaller (often much smaller) than variation between blocks, in these cases it makes sense 
to base inference on the within-block estimates of freafmenf effecfs. The populafion freaf- 
menf means, in fhis case, are fhen esfimafed as 

A;(I^) = y + ^;(w) Wifh SE(|l,(w)) = 7(2 X s^)/(n x EF) , 



where, as usual, = ResMS is the estimate of background variability at the lowest stra- 
tum. If the EF is small, then we might lose substantial information by ignoring treatment 
comparisons between blocks, and both sources of freafmenf informafion should be used. 



TABLE 9.21 

Structure of the Multi-Stratum ANOVA Table for a BIBD with m Blocks (Factor Block), w 
Units per Block (Factor Unit), t Treatments (Factor Treatment) and N = mxu Units in Total 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


Block stratum 


Treatment 


t-1 


BTrSS 


BTrMS 


BTrMS/BlkMS 


Residual 


m-t 


BlkSS 


BlkMS 


BlkMS/ResMS 


Block.Unit stratum 


Treatment 


t-1 


WTrSS 


WTrMS 


WTrMS/ResMS 


Residual 

Total 


N -m-t + 1 
N-1 


ResSS 

TotSS 


ResMS 
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However, the two variance ratios may give contradictory indications of the importance 
of freafmenf effecfs, and in cases when t = m {as in Example 9.9) fhe ResDF for fhe Block 
sfrafum will be zero, so fhaf fhe variance rafio for freafmenfs cannof be formed wifhin 
fhaf sfrafum. These problems can be solved by combinafion of informafion from fhe fwo 
sfrafa fo give a single esfimafe of fhe freafmenf effecfs, a single fesf sfafisfic for fhe freaf- 
menf ferms and a revised esfimafe of fhe sfrafum variances. This combined estimate is a 
weighted mean of fhe fwo componenfs, wifh fhe weighfing defermined by fhe (revised) 
sfrafum variances and fheir esfimafed df, and is mosf easily obfained from linear mixed 
models (see Chapfer 16), alfhough some implemenfafions of fhe mulfi-sfrafum ANOVA 
(e.g. GenSfaf) can also provide fhese esfimafes. 



EXAMPLE 9.9B: DESIGNING A BIBD EXPERIMENT EOR SEVEN TREATMENTS* 

We now have more information to assess the proposed designs of Example 9.9A. The 
first design (Table 9.20a) has seven blocks of three units for seven treatments each with 
three replicates (N = 21, t = m = 7, n = u = 3), with each pair of treatments appearing 
together only in one of the blocks (X, = 1). Its EE is therefore (X x t)/{n xu) = 7/9 = 0.778, so 
almost 78% of the treatment information is available within blocks. The SE for treatment 
comparisons based on the within-block estimates takes the form 

^(h,(w)- hi(„)) = V(2 X s^)/(n X EE) = Vo.857 x = 0.93s , 



with N-m-t + l=21-7-7+l = 8 ResDF. 

The second design (Table 9.20b) has seven blocks of size four for seven treatments each 
with four replicates {N = 28, t = m = 7, n = u = 4), with each pair of treatments appearing 
together in two of the blocks (X = 2). Its EE is therefore (X x t)/{n xu) = 7/8 = 0.875, so 
almost 88% of the treatment information is available within blocks. The SED for treat- 
ment comparisons takes the form 

^(h/(w)- hM„)) = V(2 X s")/(« X EE) = Vo.571 x s" = 0.76s 

with N-m-t + l=28-7-7 + l = 15 ResDF. 

So, for a 33% increase in the number of units (from 21 to 28) we get a 13% increase in 
the proportion of information available within blocks (from 0.78 to 0.88) and an 18% 
reduction in the SE for treatment comparisons (if we assume that s would be similar 
across the two experiments). In addition, the ResDF has almost doubled from a value 
that is barely adequate (ResDF = 8) to a value (ResDF = 15) that is likely to give a reason- 
able estimate of the background variation. Given that the original experimental outline 
allowed for 28 pots, this gives several good reasons to choose the larger experiment 
(Table 9.20b). 



9.3.3 Drawbacks and Variations of the Balanced Incomplete Block Design 

The BIBD is usually a good design when the number of treatments is only a little larger 
than the number of units within each block. The main drawback of these designs is that a 
BIBD might not exist for any given combination of treatment number, replication level and 
block size. Even when a BIBD design does exist, it might require many more replicates of 
each treatment than is practicable (recall that the number of blocks must be at least equal 
to the number of treatments). The explanatory structure can be extended to accommodate 
factorial and other structures (see Chapter 8), although requirements of balance across 
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individual model terms then impose an even greater restriction on the available designs. 
For these reasons, several different classes of partially balanced incomplete block designs 
have been developed, dividing the treatment pairs into two or more groups. Treatment 
pairs within each group then have a different value of X, so fhaf differenf groups of freaf- 
menf comparisons are esfimafed wifh differenf levels of precision. These designs can relax 
some of fhe pracfical consfrainfs imposed by BlBDs whilsf refaining some advanfages of 
a balanced design. Some classes of parfially balanced incomplefe block designs can be 
analysed wifh algorifhms for mulfi-sfrafum ANOVA, buf mosf can be analysed only wifh 
more general algorifhms, such as fhose associafed wifh linear mixed models (see Secfion 
11.6 and Chapfer 16). Issues of balance and orfhogonalify associafed wifh fhese and ofher 
forms of design are discussed furfher in Chapfer 11. Parfially balanced incomplefe block 
designs and unbalanced designs are discussed by Mead ef al. (2012, Chapfers 7 and 9). 



EXERCISES 

9.1 A 5 X 5 LS design was used fo invesfigafe fhe effecf of sulphur ferfilizer on fhe 
yield (fonnes/ha) of spring barley grown on a lighf soil. Five levels of ferfil- 
izer were applied (0, 10, 20, 30 and 40 kg S). File sulphur.dat confains fhe plof 
numbers (Plot), sfrucfural factors (Row, Col), fhe freafmenf factor (Sulphur) 
and fhe grain yield (variate Grain). Write down fhe full model for fhe yields in 
bofh mafhemafical and symbolic form. Analyse fhe dafa and sfafe your conclu- 
sions. Whaf ofher hypofheses mighf you like fo fesf? (We re-visif fhese dafa in 
Exercise 17.4.)* * 

9.2 An experimenf used fhree incubators fo compare growfh of fungal colonies of 
Metarhizium anisopliae af fhree femperafures (23°C, 30°C and 35°C; Wrighf, 2013). 
Replicafion of femperafures was achieved by repeafing fhe experimenf on fhree 
occasions. Femperafures were allocafed fo incubators according fo a 3 x 3 LS, so 
each incubator ran once af each femperafure. Small fungal plugs were placed in 
Pefri dishes and fhree dishes were placed in each incubator on each occasion. 
The sizes of fhe fungal colonies were recorded after four days. The dish num- 
bers (ID), sfrucfural factors (Incubator, Occasion and Dish), explanatory factor 
(Temperature) and size measurements (variate Size) are given in file size.dat. 
Write down the structural component of the model for the colony sizes. Analyse 
the data and state your conclusions. What can you say about the effect of tem- 
perature on the growth of these fungal colonies?^ 

9.3 A three-year field trial was set up to investigate the susceptibility of six variet- 
ies of lily to the lily beetle, L. lilii (Salisbury et al., 2010). The trial was laid out 
as two independent 6x6 LSs. Regular counts of beetle adults, eggs and larvae 
were made between May and early August each year. The file lily.dat contains 
the unit numbers (ID), structural factors (Square, Row, Column), explanatory fac- 
tor (Variety) and the total count of larvae observed during 2006 (variate Larvae). 
Analyse these data on an appropriate scale. Are these lily varieties equally 
susceptible?! 



Data from S. McGrath, Rothamsted Research. 

* Data from E. Wright, Rothamsted Research. 

t Data from A. Salisbury, Royal Horticultural Society/Rothamsted Research/Imperial College London. 



Models with More Complex Blocking Structure 



239 



9.4 A series of field experimenfs fesfed various 'push-pull' sfrafegies fo confrol 
insecf pesfs in oilseed rape. In one experimenf fhe use of furnip rape (TR) as 
an earlier flowering frap crop (fhe 'pull') was fesfed alongside use of an anfi- 
feedanf applied fo oilseed rape in spring (S; fhe 'push'). Unfreafed oilseed rape 
(U) was included as a confrol. The experimenf was sef up as a 6 x 6 LS wifh fwo 
replicafes of each of fhe fhree freafmenfs per row and column. An assessmenf 
of adulf pollen beefle numbers was made on 10 planfs per plof in early April, 
one day posf-spray of fhe anfi-feedanf. The unif numbers (ID), sfrucfural facfors 
(Row, Column), freafmenf factor (Treatment) and mean pollen beetle count per 
plot (variate Count) are given in file pollenbeetles.dat. Is there any evidence 
that either of the pull or push strategies works?* * 

9.5*A field experiment investigated the effect of four herbicides (A, B, C, D) on the 
yield of three varieties of onions (VI, V2, V3). The herbicides could only be 
applied to relatively large areas of land due to the width of the spray boom, so 
the experiment was set up as a SP design with three blocks of four main plots 
to which the herbicides were applied. Each main plot comprised three subplots 
to which the varieties were allocated. The final yield of onions per subplot was 
recorded at harvest. The file onions.dat contains the unit numbers (ID), struc- 
tural factors (Block, MaInPlot, Subplot), explanatory factors (Herbicide, Variety) 
and final yields (variate Yield). Write down the structural and explanatory com- 
ponents of the model for the onion yields. Analyse these data and summarize 
your conclusions. 

9.6 A field experiment studied forms and rates of nutrient application and the 
effect on the yield of spring barley in the presence or absence of foliar diseases. 
Nitrogen fertilizer was applied either in a liquid form, alone (L) or with a nitrifi- 
cation inhibitor added (LI), or in a solid form, to the seedbed (SS) as a top-dress- 
ing (ST) or split (half to the seedbed and half as top-dressing, SST). Each form 
was applied at two rates (70 and 110 kg N/ha), giving 10 nutrient treatments in 
total. The occurrence of foliar diseases was intended to be manipulated by a 2 x 2 
factorial in the presence or absence of a mildew fungicide (None, Tridemorph) 
and a rust fungicide, but no rust developed and so the latter fungicide was not 
applied. The trial used a SP design with two blocks. The 10 nutrient treatments 
were applied to main plots, each of which was split into four subplots, and 
the mildew fungicide was applied to two subplots in each main plot. The plot 
numbers (ID), structural factors (Block, MaInPlot, Subplot), explanatory factors 
(NForm, NRate, MildewF) and yield at harvest (variate Yield, tonnes/hectare at 
85% dry matter) are in file springbarley.dat. Identify a suitable predictive model 
and comment on the comparison between liquid and solid forms of fertilizer.^ 

9.7 A field experiment compared the effects of three strains of barley yellow dwarf 
virus (BYDV, a virus transmitted by aphids) on yield of two varieties of winter 
barley, one (Vixen) with genetic resistance to BYDV and the other susceptible 
(Igri). The experiment aimed to test the efficacy of any resistance, its consistency 
across the strains, and the effectiveness of insecticide sprays at different times 
of the year (Cypermethrin in October or December, or Pirimicarb in March). 
A SP design was used with five blocks of six main plots each split into four 



Data from L. Smart, Rothamsted Research. 

* Data from J. Jenkyn, Rothamsted Research. 
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subplots. The six combinations of variety and spray timing were applied to main 
plots within blocks. The virus strains (MAV, PAV or RPV) were applied to sub- 
plots by releasing infected aphids in the centre of fhe subplof; one subplof was 
leff uninoculafed in each main plof. The plof numbers {ID), sfrucfural facfors 
(Block, MainPlot, Subplot), explanatory facfors (Variety, Spray, Strain) and yield 
at harvest (variate Yield, tonnes/hectare at 85% dry matter) are in file bydv.dat. 
Analyse these data and relate your conclusions to the experimental aims stated 
above.* * 

9.8 An experiment assessed the response of two aphid clones to a foliar insecticide 
applied to cabbage plants. The experiment used two simulators, each containing 
six plants in individual pots. All plants in one simulator were sprayed with the 
insecticide and all plants in the other were sprayed with water only (control). 
Two weeks after spraying adult aphids were placed onto the plants using clip 
cages. Two clip cages were attached to each plant, one containing three aphids of 
a clone susceptible to the insecticide, the other containing three aphids of a mod- 
erately resistant clone. The number of nymphs produced by the adults in each 
clip cage was recorded after two days. The experiment was then repeated using 
two new simulators. File simulator.dat contains the unit numbers {ID), struc- 
tural factors (Expt, DSImulator, Plant, DCage), explanatory factors (Treatment, 
Clone) and the nymph counts (variate Nymphs). Determine the structural and 
explanatory components for this experiment, write down the full model in sym- 
bolic form and state the experimental units for the insecticide and clone treat- 
ments. Analyse the data and verify that the explanatory terms are tested in the 
correct strata. Identify and interpret the predictive model. (We re-visit these data 
in Exercise 16.1.)+ 

9.9 An experiment to compare yields of 13 varieties of corn was set up as a BIBD 
with 13 blocks, each containing four plots (Cochran and Cox, 1957, Table 11.2). File 
coRN.DAT contains the unit numbers {ID), structural factors (Block, Plot), explan- 
atory factor (Variety) and plot yields (variate Yield, pounds per plot). Calculate 
X and the efficiency factor for this design. Is the design resolvable? Is there any 
evidence of differences in yield among the varieties? 



Data from R. Plumb, Rothamsted Research. 

* Data from S. Foster, Rothamsted Research. 
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In Chapter 3, we examined the principles of replication, randomization and blocking that 
are central to the construction of efficienf designs. However, in doing so we did nof say 
how fo choose fhe number of replicafes fo be used for each freafmenf. The quesfion 'How 
many replicafes do I need?' is probably fhe mosf common quesfion posed fo consulfanf 
sfafisficians, buf fhe answer is rarely obvious! As fhe number of replicafes increases, 
smaller differences among a sef of freafmenfs can be defecfed because more informa- 
fion becomes available. Conversely, wifh few replicafes or large background variafion, 
or bofh, we mighf nof defecf differences befween freafmenf populafion means as sfafisfi- 
cally significanf even if some of fhose differences are large and biologically meaningful. 
In general, fhe replicafion required in a sfudy depends on numerous (possibly compef- 
ing) feafures, such as 

• The available resources (money, experimenfal maferial, space and fime) 

• Treafmenf sfrucfure 

• Size of freafmenf difference(s) fo be defecfed 

• Relafive imporfance of differenf freafmenf comparisons 

• Risks associafed wifh wrong decisions (false-posifive or false-negafive resulfs) 

• Variabilify associafed wifh fhe experimenfal unifs and measuremenf process 

The firsf five ifems in fhis lisf are usually eifher a maffer of choice or resfricfed by 
pracfical considerafions. The risks of making wrong decisions can be relafed fo fhe ideas 
of hypofhesis fesfing infroduced in Secfion 2.3.2. However, fhe lasf ifem, namely fhe 
variafion in fhe dafa, which we have called background variafion and denofed a^, is nof 
under fhe confrol of fhe experimenfer, is inherenf fo fhe process under sfudy and is offen 
unknown. 

In fhis chapfer, we discuss how fo defermine fhe number of replicafes fo be included in 
an experimenf. To assess fhe required replicafion we musf specify fhe minimum size of 
a frue freafmenf difference (i.e. fhe difference befween populafion means for fwo freaf- 
menfs) fhaf should be defecfed as sfafisfically significanf (for a given significance level aj. 
Firsf, we describe some simple approximafe mefhods fo defermine fhe number of repli- 
cafes required for an experimenf, based on fhe required size of freafmenf difference and 
fhe esfimafed LSD (Secfion 10.1). These mefhods illusfrafe fhe imporfance of obfaining 
a good esfimafe of fhe background variafion bofh before fhe experimenf and wifhin fhe 
analysis (Secfion 10.2). The imporfanf concepf of fhe power of a design, which gives fhe 
probabilify of defecfing a freafmenf difference of a given size, is fhen infroduced (Secfion 
10.3). An example is used fo illusfrafe fhese ideas for a parficular scenario (Secfion 10.4). 
Finally, fhe usual null hypofhesis (of no freafmenf differences) is nof useful when fhe 
purpose of an experimenf is fo illusfrafe fhe equivalence of, rafher fhan fhe difference 
befween, freafmenfs. In fhis case an alfernafive sfrafegy of fwo one-sided f-fesfs (TOST) is 
offen used fo give a more powerful fesf (Secfion 10.5). 
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10.1 Simple Methods for Determining Replication 

In this section, we suppose that we wish to detect an observed difference of a given size 
befween fwo freafmenfs. We denofe fhis observed difference as d fo disfinguish if from fhe 
frue (buf unknown) difference befween fhe freafmenf populafion means, which we denofe 
as 5. In fhe designs considered fhus far, fhe leasf significanf difference (LSD) (Secfion 4.4) 
has been defined as fhe smallesf observed difference befween fwo freafmenfs fhaf will 
be defecfed as sfafisfically significanf af a specified significance level. In fhis secfion, we 
defermine replicafion in ferms of fhe LSD, direcfly in Secfion 10.1.1 and indirecfly, via fhe 
coefficienf of variafion, in Secfion 10.1.2. Bofh of fhese cases are illusfrafed for fhe CRD 
wifh equal replicafion, wifh exfensions fo ofher designs given in Secfion 10.1.3. 

10.1.1 Calculations Based on the LSD 

Initially, we focus on the CRD and consider an experiment with equal replication (n) for 
each of t treatments using a total oi N = nxt experimental units. In Section 4.4, the LSD 
between two treatments in a CRD with equal replication was derived as 



LSD = t 



[cis/2] 

N-t 



xSED = 




( 10 . 1 ) 



where tw^' is the 100(1 - as/2)th percentile for the t-distribution with N - f df (the residual 
df, ResDF, for the CRD). However, in making calculations prior to experimentation, the 
estimate is not available. Unfortunately, obtaining a realistic pre-experiment estimate of 
is often difficult, and we discuss strategies for overcoming this problem in Section 10.2. 
For now, we assume that an appropriate value is available. 

The LSD indicates the size of estimated (or observed) treatment differences that should 
be detected as statistically significant (at significance level aj by ANOVA. If we wish to 
detect an observed difference d between two treatments as significant, then it follows that 
we want LSD < d or, from Equation 10.1, 



t 



s/2 

ResDF 



X 




( 10 . 2 ) 



We can systematically evaluate the left-hand side of this inequality for increasing values of 
n: the required replication is the smallest value of n for which this inequality is satisfied. 



EXAMPLE lO.lA: SAMPLE SIZE CALCULATIONS EOR A NEW 
CALCIUM POT TRIAL* 

A scientist is planning a follow-up experiment to the calcium pot trial presented in 
Example 4.1 to confirm these results. This new experiment will again be a CRD with 
t = 4 treatments and, based on the previous experiment, is expected to be approxi- 
mately 75 (s = 8.66). Observed treatment differences of d = 10 cm are required to be 
detected as statistically significant with = 0.05. By systematic evaluation of the LSD 
for different values of n, as shown in Table 10.1, we find that a replication of 7 is the 
smallest value that gives a value of the LSD less than 10. 
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TABLE 10.1 

Calculation of SED and LSD for a CRD with t = 4 Treatments, Varying 
Replication («) and Estimated Residual Variance = 75 (Example lO.lA) 



Replication (n) 


Units (N = nx t) 


Residual df (N - t) 


.[0.025] 

iN-t 


SED 


LSD 


2 


8 


4 


2.776 


8.66 


24.04 


3 


12 


8 


2.306 


7.07 


16.31 


4 


16 


12 


2.179 


6.12 


13.34 


5 


20 


16 


2.120 


5.48 


11.61 


6 


24 


20 


2.086 


5.00 


10.43 


7 


28 


24 


2.064 


4.63 


9.55 



Since the true value of the background variation is unknown, it is sensible to verify 
the impact of a range of values of s^. Eor example, here we might realistically expect to 
lie between 50 and 100. As shown in Table 10.1, a CRD experiment with n = 7 replicates 
of each of t = 4 treatments has a 5% critical t-value of tjjlf ' = = 2.064. The possible 

range of the SED is then calculated as 

minimum(SED) = V(2 x 50/7) = 3.78 , 
maximum(SED) = V(2 x 100/7) = 5.35 , 

and the corresponding LSD values are 

minimum(LSD) = tjjlf ' x minimum(SED) = 2.064 x 3.78 = 7.79 , 

maximum(LSD) = tjjif' x maximum(SED) = 2.064 x 5.35 = 11.02 . 

So, with seven replicates for each treatment, we might detect observed treatment dif- 
ferences in the range 7.8-11.0 cm. If this worst-case scenario is unacceptable then we 
might consider further increasing the replication, or take additional measures to reduce 
background variation (if this is possible). 

If the replication required exceeds the resources available then some compromise must 
be found. For example, some treatments might be eliminated to enable increased replica- 
tion of the remaining treatments, or reduced precision might be accepted. This is dis- 
cussed further at the end of Section 10.3. 



10.1.2 Calculations Based on the Coefficient of Variation 

The coefficient of variation (%CV) for a sample is defined as 

%CV = 100 X s/y , 

where s is the unbiased sample standard deviation and y the sample mean (Section 2.1). 
The %CV can be a useful measure for evaluating the quality of experiments where the 
background variation increases with the mean, as the %CV is often quite stable for suc- 
cessful experiments. An increase in %CV then indicates an unexpectedly large value of 
background variation and hence some problem with the trial. 
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The LSD can be rewritten, in terms of the %CV and sample mean, by multiplying both 
the numerator and denominator of Equafion 10.1 by 100, and dividing bofh by fhe sample 
mean fo obfain 



LSD = f 



[as/2] 

N-t 



X 




X s = f 



[as/2] 

N-t 



X 



l2 „ loos/y 

Vw 100/y 




X 




%CV ^ 

100 J 



xy ■ 



The LSD can fhus be evaluafed as a proporfion of fhe mean for differenf levels of replicaf ion. 
In fhis form, a suifable esfimafe of %CV rafher fhan is required, so if accepfable ranges 
of %CV are well esfablished, fhis may provide a more useable approach fo fhe calculafion 
of an appropriafe replicafion, as illusfrafed in Example 10. IB. 

EXAMPLE lO.lB: SAMPLE SIZE CALCULATIONS EOR A NEW 
CALCIUM POT TRIAL* 

The %CV for the calcium pot trial of Example 4.1 was 14% and experience of similar 
experiments suggests that the %CV should be at worst 20%. Example lO.lA suggested 
that a follow-up experiment should have replication n = 7. The LSD between two treat- 
ment means each with seven replicates is estimated for %CV = 14 by 

LSD . (.r” X if X X / - (2“ X # X X y . O.lSy , 



and for %CV = 20 by 



15D = X ^ X X y = [2.064 x ^ x x y = o.22y . 

Hence, in the worst case, we expect to detect any observed difference between two 
treatments that is larger than 22% of the overall mean response as statistically signifi- 
cant (at the 5% level). If the new experiment is as precise as the previous experiment, 
with %CV = 14, this decreases to 15% of the overall mean response. 



10.1.3 Unequal Replication and Models with Blocking 

In the calculations above we assumed the simplest experimental design of a CRD wifh 
equal replicafion. In general, a more complex design mighf be used, perhaps wifh unequal 
replicafion. Calculafions for fhe case of unequal replicafion follow direcfly from fhe for- 
mula for fhe SED befween fwo freafmenfs wifh replicafion n, and Wy, respecfively, i.e. 



SED = 




1 1 

— + — 



M; 






The exfension fo more complex designs is similarly sfraighfforward, requiring only fhaf 
fhe appropriafe form of fhe SED is subsfifufed info Equafions 10.1 and 10.2, and fhaf fhe 
appropriafe ResDE are applied fo obfain fhe crifical value of fhe f-disfribufion, which can be 
expressed in more general form as fg^Dp. Eor example, for a RCBD wifh n blocks and t freaf- 
menfs, fhe residual df musf be adjusfed fo ResDE = (n - 1) x (f - 1) (see Secfion 7.3), whilsf 
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for a LS design with t treatments the residual df must be adjusted to ResDF = (t - 1) x (t - 2) 
(see Section 9.1). 



10.2 Estimating the Background Variation 

The methods presented in Section 10.1 require a plausible value (or range of values) for the 
estimated background variation, s^, to be available before the experiment is done. In some 
cases, the %CV can be a useful alternative, but this is often not available, and so a strategy 
to obtain a 'reasonable' estimate of (or the %CV) is required. 

The simplest option is to obtain estimates of variation from previous studies that used 
similar experimental emits rmder similar conditions. Ideally, experiments from the same 
institution (or laboratory) should be used, as long as the study conditions and protocols are 
analogous. Another, albeit more expensive, alternative is to do a preliminary (or pilot) study, 
using a subset of the proposed experimental treatments to establish the size and sources 
of variability (see e.g. Case Study 19.1). Such preliminary studies are often used in labora- 
tory work to calibrate new experimental techniques. Published reports or papers describing 
similar experiments are another possible source of information, but these may provide less 
reliable estimates if insufficient detail is given or if the experimental conditions are different. 

If none of these options is available then a mixture of common sense and good guesswork is 
required. If the expected range of values (for a single treatment) is known, and these observa- 
tions are expected to follow a Normal distribution, then the properties of this distribution can 
be used. It is well known that 95% of the observations from a Normal distribution are found 
within approximately two standard deviations of the population mean (Figure 2.4). Therefore, 
if the likely minimum and maximum values for experimental units receiving the same treat- 
ment can be predicted then the population standard deviation a can be approximated as 

maximum - minimum 



and this can be substituted for the estimate s. If there is much uncertainty about the likely 
variation then consideration of a range of possible values may be helpful (as in Example lO.lA). 

When we do an experiment, we obtain a new estimate of the background variation and 
the precision of that estimate increases as the residual df increases, reflecting the amount 
of information available. For this same reason, the critical value tResoj decreases as the 
residual df increases. As a rule of thumb, to ensure a reasonable estimate of the back- 
ground variation the replication should be sufficiently large to give at least 10 residual df. 
As the gain in precision decreases as the residual df increases further, there is usually little 
advantage in having more than 20 residual df (see Chapter 19). 



10.3 Assessing the Power of a Design 

The calculations in Section 10.1 used the estimated LSD to assess whether an experiment 
would detect an observed treatment difference of a given size (denoted d). In practice, we 
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are more interested in the true treatment difference (5), which cannof be observed direcfly 
and is esfimafed wifh error. Because of fhe sfochasfic nafure of fhis error, we can use prob- 
abilify calculafions fo evaluafe whefher a frue freafmenf difference of size 5 is likely fo 
be defecfed. The probabilify fhaf a frue freafmenf difference of size 5 will be defecfed as 
sfafisfically significanf is called fhe power of fhe fesf, and hence of fhe design. The power 
is a funcfion of fhe size of fhe freafmenf difference 5. 

For any sfafisfical fesf, fhe significance level and power are relafed fo errors of inference 
fhaf may occur when a given hypofhesis is fesfed. The ferminology associafed wifh fhese 
inferenfial errors and fhe relafed probabilifies is summarized in Table 10.2. 

A Type I error occurs when Hg is rejecfed when if is frue, i.e. a false-posifive conclusion, 
for example, fhaf fhe populafion means differ when in facf fhey are equal. The probabilify 
of a Type I error occurring is denofed a^, i.e. Prob(Type I error) = a^. As menfioned previ- 
ously (Secfion 2.3.2), is fhe pre-defermined significance level (or size) of a fesf, and is 
offen chosen fo be 0.05. 

A Type II error occurs when Hg is nof rejecfed when if is false, i.e. a false-negafive conclu- 
sion, for example, fhaf fhe populafion means are equal when in facf fhey differ. The prob- 
abilify of a Type II error occurring is denofed P^, i.e. ProbfType II error) = p^. The power 
of a test is directly associated with the Type II error rate and is defined as fhe probabilify 
of making fhe correcf decision fo rejecf Hq when Hq is false, so power = 1 - P^.. Tesfs wifh 
large power (and fherefore small PJ are preferred, as fhey give a larger chance of defecf- 
ing freafmenf differences for a given design. However, fhere is a relafionship befween fhe 
Type I and Type II error rafes (a^ and PJ fhaf usually makes some compromise on eifher 
significance or power inevifable. 

To demonsfrafe fhese concepfs we consider a fesf concerning populafion means, Pj and 
Pj/ for fwo equally replicafed freafmenfs assessed in an experimenf wifh a CRD. The null 
hypofhesis of no difference, Hg: pj = P2 or H„: 5 = Pj - P2 = 0, is fo be fesfed againsf fhe one- 
sided alfernafive hypofhesis fhaf fhe populafion mean of fhe firsf group is larger, Hp pj > P 2 
or Hj! 5 = Pi - P 2 > 0. We use a one-sided fesf here for simplicify, buf fhe same concepfs 
exfend fo fhe fwo-sided case. We make fhe usual assumpfions abouf fhe deviafions (see 
Secfion 4.1). Figure 10.1 illusfrafes fhe relafionship befween fhe significance level and fhe 
power of fhe f-fesf for assessing fhese hypofheses as fhe value of fhe frue freafmenf differ- 
ence, 5 = Pi - P2, varies. The curves represenf fhe sampling disfribufions of fhe observed 
fesf sfafisfic, i.e. fhe random variable f = (|ii - p 2 )/SED, in differenf sifuafions. 

In bofh graphs, fhe leff-hand curve represenfs fhe sifuafion when fhe null hypofhesis is 
frue (5 = 0). This is a f-disfribufion wifh mean zero and df equal fo fhe ResDF used fo esfi- 
mafe fhe SED (in fhis figure, ResDE = 24). The righf-hand curve shows fhe disfribufion of fhe 
observed f-sfafisfic under fhe alfernafive hypofhesis. In Eigure 10.1a, fhe frue difference is 



TABLE 10.2 



Terminology for Inferential Errors and Probabilities 
Associated with a Hypothesis Test 







Decision (Probability) 


Accept Fl„ 


Reject FIj 


Null 


True 


Correct decision 


Incorrect decision 


hypothesis 




(i-«0 


(Type I error, aj 


(H„) 


False 


Incorrect decision 


Correct decision 






(Type II error, PO 


(Power, 1 - pO 
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8 = 0 



jKs] 

ResDF 



8 = 3.5 X SED 



FIGURE 10.1 

Definition of Type I (grey area, aj and Type II (black area, pj error probabilities for the t-test of the null hypoth- 
esis of no treatment differences (H,,: 5 = 0) against a one-sided alternative hypothesis (Hj: 5 > 0) for the difference 
8 between two treatment population means: (a) 8 = 2.5 x SED, (b) 8 = 3.5 x SED. tResop denotes the 100(1 - Ci 3 )th 
percentile of a t-distribution with ResDF df. 



5 = 2.5 X SED, and in Figure 10.1b, the difference is slighfly larger, wifh 5 = 3.5 x SED. These 
disfribufions are non-cenfral f-distribufions wifh non-cenfrality parameter 5/SED and 
ResDF df (see Secfion 2.2.4). The dofted verfical lines mark fhe median of each disfribufion. 
The dashed verfical line shows fhe crifical value for fhe f-fesf, equal fo the 100(1 - ajth 
percentile of fhe f-disfribufion wifh ResDF df, i.e. t^^Dp. Any observed difference between 
the treatment means that is greater than this critical value (to the right of the dashed line) 
is declared as significantly different from zero, and any observed difference smaller than 
this critical value (to the left of the dotted line) is declared as not significantly different 
from zero. The grey-shaded area corresponds fo fhe rejection region of size a^, and fhe 
black-shaded area corresponds to the Type II error of size in each case. The power is 
equal to 1 - P^ (i.e. the non-shaded area of fhe righf-hand disfribufion). The grey area stays 
the same size whatever the true value of fhe difference, 5, whereas the size of the black area 
changes as 5 changes: as 5 increases, the black area (PJ decreases, and the power increases; 
as 5 decreases, the black area (PJ increases and the power decreases. The power function 
of a test expresses the power as a function of 5. In pracfice, if is offen easiesf to state the size 
of difference 5 thaf fhe tesf is required to detect and to calculate the corresponding power. 

Figure 10.1 also indicates how characteristics of the test influence power. For example, 
increasing fhe significance level (i.e. decreasing a^, e.g. = 0.01 instead of = 0.05), shiffs 
fhe crifical value (dashed line) to the right and increases the black area (PJ, thus reduc- 
ing the power. A common compromise is to aim for a fesf with = 0.05 and P^ = 0.20 
(power = 0.80) for a given treatmenf difference. A decrease in the SED, through a decrease 
in background variation, increase in replication or increase in the ResDF, makes the two 
distributions narrower so that their overlap decreases and hence the power increases. 

For our example of a freafmenf comparison in a CRD, fhe calculafions are straighf- 
forward. We calculafe power in ferms of fhe disfribufion of fhe fesf sfafisfic under 
fhe alternafive hypothesis, which is a non-central t-distribution with non-centrality 
parameter 5/SED and df equal fo fhe ResDF (defined in Secfion 2.2.4). Probabilify 
funcfions for these distributions are present in most statistical software, although not 
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commonly available in statistical tables. We denote the cumulative distribution function 
for fhe non-cenfral f-disfribufion wifh non-cenfralify paramefer c on D df af value x as 
F((x, c, D). The power of a one-sided fesf wifh Hp 5 > 0 is fhe probabilify of rejecfing fhe 
null hypofhesis if fhe alfernafive hypofhesis is frue, and can be calculafed as 

Power(5) = Prob(Ho rejecfed | pi - p .2 = 5) 

= Prob(f > fResDF I hi “ M -2 = §) 

= 1 - Ft(tfe’oF, 5/SED, ResDF) , 



i.e. fhe porfion of fhe non-cenfral f-disfribufion fhaf exceeds fhe crifical value. The power 
of a one-sided fesf wifh Hp 5 < 0 is calculafed similarly as 

Power(5) = Pi(-ffeDF, 5/SED, ResDE) , 



i.e. fhe porfion of fhe non-cenfral f-disfribufion fhaf is less fhan fhe crifical value. This 
calculafion uses fhe symmefry of fhe f-disfribufion fo derive fl ?^~n F = -tlSjDF- Nof surpris- 
ingly, fhe power of a fwo-sided fesf, wifh Hp 5 0, combines fhese fwo expressions, having 
adjusfed fhe crifical value, and is calculafed as 

Power(5) = 1 - Ft(fS^i§, 5/SED, ResDF) + Fi(-fS^/,^"’ , 5/SED, ResDF) , 

i.e. fhe porfion of fhe non-cenfral f-disfribufion fhaf lies oufside of fhe fwo crifical values. 

EXAMPLE lO.lC: SAMPLE SIZE CALCULATIONS FOR A NEW 
CALCIUM POT TRIAL* 

In Example lO.lA, we considered a follow-up experiment to the calcium pot trial origi- 
nally introduced in Example 4.1. This had four treatments (t = 4), background varia- 
tion of = 75 (s = 8.66), with a requirement to detect observed treatment differences of 
10 cm at a significance level of = 0.05. The simple approach of Section 10.1 required 
replication oin-1 for each treatment. Now we also want to consider the power associ- 
ated with this design. For a two-sided test with ResDF = 24, the critical values are 
and equal to +2.064. From Table 10.1, for seven replicates the SED is equal to 4.63. 

The non-centrality parameter is then 6/SED = 10/4.63 = 2.16. The CDF of the non-central 
t-distribution satisfies 

F, (-2.064, 2.16, 24) < 0.0001, F/2.064, 2.16, 24) = 0.455 , 

and hence the power for a two-sided test with 5 = 10 is 1 - 0.455 + 0.000 = 0.545. This 
means that with seven replicates, we have only a 55% chance of detecting a true treat- 
ment difference of size 10 cm, given that our assumptions about the background varia- 
tion are true. Table 10.3 shows the power for greater replication, and replication of w = 13 
pots per treatment (with a total of N = 52 pots) is necessary to get power greater than 
0.80 for a difference of 10 cm. 

In principle, power can be calculafed for any sfafisfical fesf, buf fhe calculafions are offen 
quife complex. In fhe confexf of ANOVA, we are offen inferesfed in fhe null hypofhesis 
fhaf a sef of freafmenf populafion means are all equal againsf a general alfernafive hypofh- 
esis of some difference befween populafion means, evaluafed by using an F-fesf. Power 
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TABLE 10.3 

Calculation of SED and Power for a Difference of Size 6 = 10 Units in a 
CRD with t = 4 Treatments, Varying Replication («) and Estimated 
Residual Variance = 75 (Example lO.lC) 



Replication (n) 


Units 
(N = nxt) 


Residual 

dUN=t) 


SED 


5/sed 


.[0.025] 

iN-t 


Power 


7 


28 


24 


4.63 


2.16 


2.064 


0.545 


8 


32 


28 


4.33 


2.31 


2.048 


0.606 


9 


36 


32 


4.08 


2.45 


2.037 


0.661 


10 


40 


36 


3.87 


2.58 


2.028 


0.710 


11 


44 


40 


3.69 


2.71 


2.021 


0.753 


12 


48 


44 


3.54 


2.83 


2.015 


0.790 


13 


52 


48 


3.40 


2.94 


2.011 


0.822 



calculations for this test are more complicated and details can be found in Montgomery 
(1997); however, the concepts are similar and this problem corresponds to an extension of 
fhe situation illustrated above. Most statistical software contains facilities to determine the 
power of standard designs, such as those described in previous chapters. 

It is always useful to calculate the power of potential designs for an experiment, pref- 
erably using a range of plausible values for background variation. If it is possible to use 
sufficient replication to give power > 0.8, then this should usually be done. However, 
huge replication is not always desirable: the treatment differences fhat can be detected 
might be too small to be biologically meaningful, which implies fhat the experiment 
is over-precise and potentially represents a waste of resources. Because of large back- 
ground variation or limited experimental resources, or both, it is more common in much 
biological research to find that the intended design has weak power (< 0.5). In this case, 
there are several options open to the experimenter: the number of treatments tested 
might be reduced to allow the replication of the remaining treatments to be increased, 
or the experiment might be repeated at a later date. If neither of fhese options is avail- 
able then the investigator must decide if it is worthwhile using resources to pursue an 
experiment that is unlikely to detect treatment differences of a given size even if they 
are present. 



10.4 Constructing a Design for a Particular Experiment 

In previous sections, we have considered how to calculate the power of a f-test within a 
given design. In practice, constructing the design for an experiment usually involves a com- 
promise between several constraints (previously discussed in Chapter 3), of which power 
is only one. The first step in designing any experiment is to identify the experimental units 
that are to be used. Once these are identified, we need to determine any practical or physi- 
cal constraints on the available resources, such as the maximum number of experimental 
units available (or affordable), and any physical or practical structures associating groups of 
these experimental units. These structures may arise from the intrinsic nature of the units, 
for example shelves within a CE cabinet, or from the way in which these units are used in 
the experiment, such as subsets of samples processed on different days. A parallel step is to 
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consider the set of treatments to be tested, and to recognize any structure within this set. 
Recall from Chapfer 8 fhaf use of a factorial sfrucfure is generally more efficienf if several 
freafmenf factors are to be included. Finally, fhese componenfs can be combined to form 
one or more candidate designs, and fhese designs can be compared in terms of power. 

EXAMPLE 10.2: COMPARING DESIGNS EOR AN IRRIGATION EXPERIMENT* 

An experiment is required to screen a set of candidate willow varieties for susceptibil- 
ity to drought. The experiment is to be on a site with good drainage and low rainfall, 
where drought stress would be expected to occur naturally in most years. The field plots 
are to be set up as four rows of six trees, with the eight trees in the centre of each plot 
being used for measurements. At most 72 field plots are available for the trial. Three 
irrigation treatments are to be applied: no irrigation, occasional (low) irrigation and 
frequent (high) irrigation. The irrigation treatments can be applied only to large blocks 
of land. A core set of four varieties must be included in the trial, but the scientists would 
like to include some of an additional set of six varieties if possible. The requirement to 
use larger blocks of land for irrigation suggests use of a split-plot design (Section 9.2), 
with the irrigation treatments applied to whole plots, and varieties applied to subplots 
within whole plots. 

Drought stress is expected to reduce growth, and the primary aim of the experiment 
is to detect varieties badly affected by drought across a range of characteristics. A sec- 
ondary aim is to quantify the typical response to the differences in water stress. Several 
variables are to be measured after three years, including the number of shoots, where 
all of the varieties are expected to have 15-20 shoots per tree in the absence of water 
stress. This variable is usually evaluated as the mean number of shoots per tree from 
the central eight trees in each plot, and analysed with a square root transformation. The 
design is required to be able to detect a 33% decrease in number of shoots per tree in 
both the irrigation main effect (secondary aim) and in the comparisons across irrigation 
regimes within variety (primary aim). Both tests are to use a significance level of 5%. 

For the square-root-transformed mean number of shoots per tree, previous trials have 
shown the estimated subplot variation, s^, is usually close to 0.25, and that the whole- 
plot stratum variance increases with the size of the whole plots (number of subplots). 

Using these data from previous trials, it is estimated that the whole-plot stratum vari- 
ance takes the approximate form = [(0.1 x tg) + l]s^, where tg is the number of sub- 
plots in each whole plot (equal to the number of varieties). 

The statistical task is to find the most powerful design that fits within the constraints. 

The first problem is to fit the scientist's question into the framework of the analysis. Most 
of the information is in a format that we can translate directly, except for the require- 
ment to detect a 33% reduction in shoot numbers. This would translate into an additive 
difference if we were analysing data on the log-transformed scale, but we expect to use 
a square root transformation and on this scale there is no direct translation. However, a 
33% decrease from the expected mean value of about 17.5 shoots per tree is 11.7 shoots, 
or on the square root scale a decrease from 4.2 to 3.4, or 0.8 units and so we shall look for 
decreases of this order, i.e. set 5 = -0.8. 

There are two comparisons of interest, comparisons across irrigation regime within 
variety and overall comparisons of irrigation regime; we shall consider each in turn. 

We use the notation for split-plot designs introduced in Section 9.2. The population 
means for the different treatment combinations are labelled as where the first index 
j =1 . . . labels the irrigation treatments and the second index k = l ... labels the vari- 
eties. The number of irrigation treatments is fixed at = 3 and the number of varieties 
is to be decided. The number of replicate blocks, denoted m, is also to be determined. 

For the primary aim, we are interested in comparisons between irrigation treatments 
within a variety, i.e. comparisons of the form with j r. From Section 9.2.3, we 

know that the SED of the estimated comparisons takes the form 
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SE(|I;t - firt) = V2[Sw + {tB - 1) X s^)]/{tB X m) . 

We can use our estimate of = 0.25, with = [(0.1 x tg) + l]s^ to calculate 

Sw + (te - l)s^ = [(0.1 X tg) + l]s^ + (tfl - l)s^ = 1.1 X tg X = 0.275 X tg , 

and simplify the SED as 

SE(p.^- = ^2 X 0.275 X fg/ (fg x m) = ^0.55/m . 

The associated degrees of freedom (Equation 9.2) can be slightly simplified (using tj^ = 3 
and omitting redundant 'x' symbols) as 

df (w - l)[Sw + (1 b - l)s^f _ (m - l)[l.ltgs^f ^ (m - l)[l.llBf 

st/(fA - 1) + (1 b - l)sytA i[(0.1fg + l)"s^] + Klg - l)s^ i(0.11g + If + i(tg - 1) ■ 



If we chose a design with m = 5 blocks, each containing fg = 4 varieties, then the SED for 
variety comparisons is SED = V(0.55/5) = VO.ll = 0.332 with 



df = 



(m - 1)[1.11 bP 
i(0.11g + Vf + y(tg 



1 ) 



4 X [4.4]^ _ 77.44 

'i(1.4)" + 1(3) “ 1.98 



We can use these values to calculate power as described in Section 10.3 and dem- 
onstrated in Example lO.lC. The critical value of the two-sided t-test under the null 
hypothesis is t 39 'fi®' = 2.023. Under the alternative hypothesis 8 = -0.8, the non-cen- 
trality parameter is then 5/SED = -0.8/0.332 = -2.41. The CDE of the non-central t-dis- 
tribution satisfies 



Et(-2.023, -2.41, 39.11) = 0.653, E, (2.023, -2.41, 39.11) = 1.000 , 



and hence the power for a two-sided test with 5 = -0.8 is equal to 1 - 1.000 + 0.653 = 0.653. 
Similar calculations can be made for other numbers of blocks and variefies. 

For the secondary aim, a similar process can be followed. Here, we are interested 
in overall comparisons between irrigation treatments, i.e. comparisons of the form 
h,. - hr. with) 7^ r. Again from Section 9.2.3, the SED of the estimated comparisons takes 
the form 



SE(p..- p,..) = ^2si/{tTi X m) = ^2s^(0.1fg -t l)/(fg x m) = ^0.5 x (O.lfg + l)/(tg x m) , 

with (t,^ - 1) X (ffx - 1) = 2 x (m - 1) df. For the design with m = 5 blocks, each containing 
fg = 4 varieties, then the SED for irrigation comparisons is 



h,.) = VO-5 X (O.ltg -H l)/(fg X m) = VO.5 x 1.4/20 = V0.035 = 0.187 , 
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with 8 df. The critical value of the two-sided t-test under the null hypothesis is 
j.[o.o25] _ 2.306. Under the alternative hypothesis 5 = -0.8, the non-centrality parameter is 
then 5/SED = -0.8/0.187 = -4.276, and the CDF of the non-central t-distribution satisfies 



Ft(-2.306, -4.276, 8) = 0.961, Fi(2.306, -4.276, 8) = 1.000 . 



Hence, fhe power for this two-sided test with 5 = -0.8 is equal to 1 - 1.000 -t 0.961 = 0.961. 

Table 10.4 presents the results of similar power calculations for several designs that fit 
the experimental constraints. The upper limit of 72 field plots means that as the number 
of replicate blocks increases, the number of varieties that can be tested decreases. For 
both comparisons, the power is heavily influenced by the number of replicate blocks: 
as the number of blocks decreases, so does the power. The power for the main effects is 
good (> 0.75) for all of fhe designs with three or more blocks. The power for the com- 
parison within varieties is much less, and exceeds 0.5 only for designs with four or more 
blocks, which allows a maximum of six varieties. The design with four blocks and six 
varieties appears promising: it uses all of fhe available plofs, has high power for the irri- 
gation main effect (0.95) and reasonable power for the interaction (0.56), and tests two 
additional varieties. If this power is insufficient, then the design with six blocks and four 
varieties might be preferred, as this has power of 0.99 for the main effect and 0.74 for the 
variety comparisons. 



TABLE 10.4 

Split-Plot Design with m Blocks, Three Whole Plots and fg Subplots: SFD, df and Power 
for Comparing Irrigafion (Whole-Plot Treatment) within Varieties (Subplot Treatment) 
and the Main Effect of Irrigation (Example 10.2) 

Irrigation Comparison Irrigation Main 
Number of within Varieties Effect 



Varieties (fg) 


Blocks (fJi) 


Units (3 X fg X m) 


SED 


df 


Power 


SED 


df 


Power 


4 


6 


72 


0.30 


48.89 


0.736 


0.17 


10 


0.987 


4 


5 


60 


0.33 


39.11 


0.653 


0.19 


8 


0.961 


4 


4 


48 


0.37 


29.33 


0.550 


0.21 


6 


0.887 


4 


3 


36 


0.43 


19.56 


0.427 


0.24 


4 


0.701 


4 


2 


24 


0.52 


9.78 


0.281 


0.30 


2 


0.335 


5 


4 


60 


0.37 


36.92 


0.556 


0.19 


6 


0.927 


5 


3 


45 


0.43 


24.61 


0.435 


0.22 


4 


0.762 


5 


2 


30 


0.52 


12.31 


0.291 


0.27 


2 


0.372 


6 


4 


72 


0.37 


44.35 


0.560 


0.18 


6 


0.951 


6 


3 


54 


0.43 


29.57 


0.440 


0.21 


4 


0.807 


6 


2 


36 


0.52 


14.78 


0.297 


0.26 


2 


0.405 


7 


3 


63 


0.43 


34.42 


0.440 


0.20 


4 


0.839 


7 


2 


42 


0.52 


17.21 


0.302 


0.25 


2 


0.432 


8 


3 


72 


0.43 


39.18 


0.445 


0.19 


4 


0.864 


8 


2 


48 


0.52 


19.59 


0.306 


0.24 


2 


0.454 


9 


2 


54 


0.52 


21.92 


0.308 


0.23 


2 


0.474 


10 


2 


60 


0.52 


24.20 


0.311 


0.22 


2 


0.491 
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10.5 A Different Hypothesis: Testing for Equivalence 

In the context of hypothesis testing, it is important to remember that although one can 
obtain evidence against the null hypothesis, one cannot evaluate evidence in favour of 
fhe null hypofhesis. If fhe null hypofhesis is nof rejecfed, fhere are fwo possible explana- 
fions: eifher fhe null hypofhesis is frue, or if is false buf fhe background variafion is suf- 
ficienfly large fo mask freafmenf differences (i.e. fhe experimenf has insufficienf power fo 
defecf fhe frue freafmenf difference). The quesfion of inferesf is fherefore usually posed 
as fhe alfernafive hypofhesis. However, in some cases, fhe quesfion of inferesf is whefher 
fhere is equalify (or equivalence) of freafmenf populafion means, which corresponds fo 
fhe usual null hypofhesis. This sifuafion offen occurs when a new (somefimes fasfer or 
cheaper) freafmenf is compared wifh a sfandard; fhe aim is fo show fhaf fhe new freafmenf 
is equivalenf fo fhe sfandard so fhaf if can be adopfed. This scenario is known as equiva- 
lence testing and is widespread in pharmaceutical studies, although less well-established 
in plant science research. To test the question of equivalence, one musf specify a region of 
equivalence and swifch fhe roles of fhe fwo hypofheses. To illusfrafe fhese concepfs we use 
a simple example wifh fwo freafmenf groups, wifh populafion means Pj and P 2 / respec- 
fively, and difference 5 = Pj - Pj. 

We firsf define a region of equivalence by specifying a quanfify (c) such fhaf a differ- 
ence of c unifs befween fwo populafion means is nof considered biologically meaning- 
ful, so fhaf fwo populafion means are considered equivalenf if | Pi - P 2 1 = 1 5 1 < c. In fhis 
confexf, fhe null hypofhesis fo be fesfed is fhaf fhe fwo populafion means are different or 
Hq! |5| > c, againsf fhe alfernafive hypofhesis Hj: |5| < c. This is an interval hypothesis, 
i.e. the null hypothesis corresponds to a range of values. The equivalence fesfing proce- 
dure splifs fhis null hypofhesis info fwo one-sided componenfs: 

H„p5<-c, 

Hob:5>c. 

Each of fhese null hypofheses fhen has a corresponding alfernafive hypofhesis, i.e. 
Hjj,: 5 > -c and Hj,,: 5 < c. Each hypofhesis can fhen be fesfed by a one-sided f-fesf, wifh fesf 
sfafisfics f^ and f^ defined as 



" ~ SED' ” SED ■ 

Here d is the observed treatment difference, d = pi - P 2 , and SED is fhe esfimafed SE for 
fhe freafmenf comparison wifh associafed df equal fo ResDF. For a fesf wifh significance 
level ttj, we fhen rejecf null hypofhesis Hq^ if ta > tS^ljp. Similarly, we rejecf null hypofh- 
esis Hob if fb < -tResDF- The overall null hypofhesis, Hq: |5| > c, is rejecfed if both Hoa and 
Hqo are rejecfed, giving evidence in favour of fhe alfernafive hypothesis of equivalence 
befween the treatment means. This procedure is generally referred fo as two one-sided 
t-tests (TOST). As usual in hypothesis testing, there is a correspondence between the 
hypothesis test and a related confidence inferval (Cl). In fhis case, if fhe 100(1 - 2aJ% con- 
fidence inferval d ± (SED x 15 ^ 0 ^) is complefely confained wifhin fhe limifs (-c, c), fhen fhe 
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null hypothesis of inequivalence is rejected at significance level and we have positive 
evidence of equivalence. 

This procedure can be particularly useful as a secondary test where differences detected 
by ANOVA are considered to be biologically unimportant. Once an equivalence range 
has been defined, an equivalence test can establish whether there is evidence to reject a 
null hypothesis of inequivalence. For a very precise experiment, it is possible for small 
differences between treatments to be detected as significant but to then obtain evidence 
of equivalence. However, questions of power still arise, and a non-significant equivalence 
test does not prove inequivalence. 

It is possible to switch the null and alternative interval hypotheses to obtain a direct test 
for inequivalence. In this case, the null hypothesis is H;,: |5| < c which is tested against 
the alternative hypothesis 1 5 1 > c. The test statistics are the same as above, with the null 

hypothesis (equivalence) rejected if either ta < or tb > tj^^f^p. This test rejects equiv- 

alence if the Cl calculated as d + (SED x tS£op) has either its upper limit below -c or its 
lower limit above c. McBride (1999) demonstrates the use of this test (and equivalence tests) 
in the context of environmental monitoring. Note that this interval-based null hypothesis 
is not the same as our usual point null hypothesis Hq: 5 = 0, and so the results of the two 
tests might not match. 



EXAMPLE 10.3: MEASURING SOIL MICROBIAL BIOMASS 

An experiment was done to investigate the effects of changing the procedure for pro- 
cessing samples to obtain measurements of carbon in soil microbial biomass (as mg 
C per kg soil). The protocol under examination used 200 g soil samples passed over a 
2.5 mm sieve and shaken for 60 min. The experiment tested the effects of a larger sieve, 
two smaller sample weights and a reduced shaking time, giving a 2x3x2 factorial 
structure. Each of the 12 treatment combinations was replicated four times in a CRD. 
The aim of analysis is to quantify the effects of the individual modifications, whether 
they interact, and to evaluate whether any of the modified procedures obtain results 
within 10% of the standard protocol. The data are listed in Table 10.5 and held in file 
BiOMASSC.DAT. The mean for the standard protocol is 1095.5 mg C/kg, so we consider 
differences smaller than 110 mg C/kg as unimportant. 

Factors Size (sieve size). Weight (sample weight) and Time (shaking time) define the 
treatment combinations, with response variate C (microbial carbon biomass). There is 
no structural component of the linear model, which can be written as 

Response variable: C 

Explanatory component: [1] + Size*Weight*Time 

The explanatory component is a three-way crossed structure, and the ANOVA table for 
this model is Table 10.6. There is no evidence of interactions between the different modi- 
fications, but strong evidence that increasing the sieve size and decreasing the shaking 
time both decrease the quantity of biomass C measured. However, these results do not 
establish equivalence (or inequivalence) of any of the 11 test combinations in relation to 
the standard protocol and to evaluate this we examine 95% CIs based on the interval 
hypothesis, Hg! |5| < 110, shown in Table 10.7. 

The confidence limits are calculated using the 90th percentile of the t-distribution on 
36 df (t 3 °'^°' = 1.688) with SED = 48.5. There is only one case (small sieve, 50 g weight, 
60 min shaking) where there is evidence of equivalence; in this case, the 95% Cl (-61.4, 
102.4) is entirely contained within the limits (-110, 110). On the other hand, there is no 
evidence of inequivalence (no lower limits > 110 and no upper limits < -110). The absence 
of positive results here reflects the large amount of background variation and hence 
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TABLE 10.5 



Biomass Carbon (C) Measurements on 48 Samples from a CRD for Different Combinations 
of Sieve Size (S = 2.5 mm, L = 12 mm). Sample Weight (g) and Shaking Time (min). Listed in 
Treatment Order (Example 10.3 and file biomassc.dat) 



Size 


Weight 


Time 


C 


Size 


Weight 


Time 


C 


Size 


Weight 


Time 


C 


L 


20 


30 


971 


L 


200 


30 


951 


S 


50 


30 


995 


L 


20 


30 


858 


L 


200 


30 


878 


S 


50 


30 


1177 


L 


20 


30 


984 


L 


200 


30 


882 


s 


50 


30 


951 


L 


20 


30 


900 


L 


200 


30 


918 


s 


50 


30 


1118 


L 


20 


60 


1062 


L 


200 


60 


974 


s 


50 


60 


1050 


L 


20 


60 


1028 


L 


200 


60 


1097 


s 


50 


60 


1196 


L 


20 


60 


1020 


L 


200 


60 


996 


s 


50 


60 


1116 


L 


20 


60 


1106 


L 


200 


60 


1048 


s 


50 


60 


1102 


L 


50 


30 


956 


S 


20 


30 


965 


s 


200 


30 


904 


L 


50 


30 


1083 


S 


20 


30 


1068 


s 


200 


30 


983 


L 


50 


30 


764 


S 


20 


30 


922 


s 


200 


30 


959 


L 


50 


30 


836 


S 


20 


30 


968 


s 


200 


30 


926 


L 


50 


60 


1030 


S 


20 


60 


1115 


s 


200 


60 


1050 


L 


50 


60 


1014 


S 


20 


60 


1123 


s 


200 


60 


1016 


L 


50 


60 


981 


S 


20 


60 


1167 


s 


200 


60 


1144 


L 


50 


60 


1065 


S 


20 


60 


1181 


s 


200 


60 


1172 



Source: Data from Rothamsted Research (R Brookes). 



uncertainty in this experiment; the power to detect a difference of 110 mg between two 
treatment combinations is only 60%. Moreover, we have not adjusted for the number of 
tests (11) made, and so our overall rate of Type 1 error will be larger than the nominal 
value of 0.05 (see Secfion 8.8). Following McBride (1999), we could adjust the significance 
level using a Bonferroni correction, making the confidence limits even wider. 

We can conclude from this experiment that shaking time and sieve size affect the 
quantity of biomass measured, but we require additional data to establish whether the 
different procedures give measurements within the 10% range specified. 



TABLE 10.6 

ANOVA Table for Soil Microbial Carbon Biomass Measured Using Two 
Sieve Sizes (Factor Size), Three Sample Weights (Factor Weight) and Two 
Shaking Times (Factor Time) (Example 10.3) 



Source of Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Size 


1 


80,524.08 


80,524.08 


17.114 


< 0.001 


Weight 


2 


12,060.67 


6030.33 


1.282 


0.290 


Time 


1 


179,585.33 


179,585.33 


38.167 


< 0.001 


Size. Weight 


2 


10,543.17 


5271.58 


1.120 


0.337 


Size.Time 


1 


65.33 


65.33 


0.014 


0.907 


Weight.Time 


2 


8855.17 


4427.58 


0.941 


0.400 


Size.Weight.Time 


2 


5744.67 


2872.33 


0.610 


0.549 


Residual 


36 


169,385.50 


4705.15 






Total 


47 


466,763.92 
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TABLE 10.7 



Treatment Means and Differences from Standard Protocol (Small Sieve, 200 g Sample 
Weight, 60 min Shaking Time) with 95% Cl for the Differences Based on Interval 
Hypothesis Hg: |5| < 110 (Example 10.3) 



Size 


Weight 


Time 


Mean 


Difference 


Lower 

Limit 


Upper 

Limit 


Equivalent 

to 

Standard? 


Large 


20 


30 


928.25 


-167.25 


-249.1 


-85.4 


No 


Large 


20 


60 


1054.00 


-41.50 


-123.4 


40.4 


No 


Large 


50 


30 


909.75 


-185.75 


-267.6 


-103.9 


No 


Large 


50 


60 


1022.50 


-73.00 


-154.9 


00 


No 


Large 


200 


30 


907.25 


-188.25 


-270.1 


-106.4 


No 


Large 


200 


60 


1028.75 


-66.75 


-148.6 


15.1 


No 


Small 


20 


30 


980.75 


-114.75 


-196.6 


-32.9 


No 


Small 


20 


60 


1146.50 


51.00 


-30.9 


132.9 


No 


Small 


50 


30 


1060.25 


-35.25 


-117.1 


46.6 


No 


Small 


50 


60 


1116.00 


20.50 


-61.4 


102.4 


Yes 


Small 


200 


30 


943.00 


-152.50 


-234.4 


-70.6 


No 


Small 


200 


60 


1095.50 


0 


- 


- 


- 



EXERCISE 

10.1 You need to design an experiment in which you have to first make extracts from 
different cultivars and then process those extracts through a machine to com- 
pare the cultivars. You have four cultivars that you must test, and another four 
fhaf you are quife inferesfed in. If requires 10 planfs (grown in fhe same pof) fo 
make one exfracf fo run fhrough fhe machine and only one exf racf can be run af 
a fime. You have fhe resources fo make and process up fo a fofal of 30 exfracfs. 
However, fhe machine needs reseffing at least every eight runs, and the level 
of ifs readings may vary slighfly each fime it is reset. A batch of four fo eighf 
runs befween resefting fhe machine can fherefore be considered as a block. A 
pilof sfudy has shown fhaf fhe background variaf ion across a sef of four fo eighf 
runs is abouf 1 unif^, and you wish fo defecf freafmenf differences of 2 unifs. 
Consider and compare possible designs for bofh sfages of fhis experimenf. 



11 

Dealing with Non- Orthogonality 



This chapter explores the concept of orthogonality, its role in designs and the conse- 
quences of non-orfhogonalify, eifher befween fwo (or more) freafmenf factors or befween 
blocking and freafmenf factors. Non-orfhogonalify befween explanatory variafes, as may 
occur in regression models (Chapter 12), is usually termed collinearify and fhis concepf 
is discussed in more defail in Chapfer 14. A sufficienf condifion for fwo factors (or ferms) 
to be orfhogonal is given in Secfion 11.1. The procedure for analysis of a crossed model 
for fwo non-orfhogonal freafmenf factors is fhen described in defail (Secfion 11.2). If fwo 
facfors are non-orfhogonal fhen paramefer esfimafes may change according to fhe ferms 
presenf in fhe model (Secfion 11.2.1) and a unique ANOVA fable for fhe experimenf no 
longer exisfs. Some considerafion musf be given to fhe order in which fhe facfors are fiffed 
and fo fhe inferprefafion of fhe freafmenf sums of squares, giving rise to several possible 
sequenfial ANOVA fables (Secfion 11.2.2). This also resulfs in differenf fypes of sums of 
squares (Secfion 11.2.3) and procedures for model selecfion (Secfion 11.2.4). The manner in 
which predicfions are formed for individual freafmenfs is also more complex, and affecfs 
fheir inferprefafion (Secfion 11.2.5). 

Non-orfhogonalify befween block and freafmenf facfors can be planned, sfrucfured and 
exploited fo obfain an efficienf design (Secfion 11.3). Several classes of design for a facto- 
rial freafmenf sfrucfure exploif non-orfhogonalify fo reduce fhe resources required for an 
experimenf. Fracfional factorial designs (FFDs) (Secfion 11.3.1) reduce fhe replicafion and 
may even omif cerfain freafmenf combinafions fo minimize fhe number of experimenfal 
unifs, alfhough some knowledge of fhe system is required fo obfain a meaningful analysis. 
Factorial designs wifh confounding enable fhe efficienf allocafion of freafmenf combinafions 
fo small blocks (Secfion 11.3.2). Often non-orfhogonalify is unplanned, because eifher fhere 
are missing values in fhe dafa (Secfion 11.4), freafmenf facfors are accidenfally misallocafed, 
or unplanned evenfs lead fo addifional (exfraneous) facfors in fhe model (Secfion 11.5). 

Mosf sfafisfical packages confain several algorifhms fhaf can be used fo analyse a linear 
model, depending on feafures of fhe design, and mosf such algorifhms can deal wifh an 
explanafory componenf alone. Opfions become more limited when bofh explanatory and 
sfrucfural componenfs are presenf. Mulfi-sfrafum ANOVA algorifhms can deal wifh sepa- 
rafe explanafory and sfrucfural componenfs, buf require a balanced orfhogonal sfrucfure. 
Linear mixed models provide a more complex alfernafive for non-orfhogonal sfrucfures 
(Chapfer 16). However, in some cases, one can combine fhe explanafory and sfrucfural 
componenfs info a single model componenf and sfill obfain a valid analysis. This approach 
is offen called an 'infra-block analysis' (Secfion 11.6). 



11.1 The Benefits of Orthogonality 

Two explanafory variables (or ferms) in a linear model are said fo be orfhogonal if fhe 
esfimafed paramefer effecfs and sum of squares for each ferm are the same regardless of 
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whether the other term is included or not in the model. A more rigorous mathematical 
definition of orfhogonalify is beyond fhe scope of fhis book (defails can be found in Bailey, 
2008), buf fhis definifion will suffice here. For example, consider fhe case of fwo factors, A 
and B, in a design wifh equal replicafion of all factorial combinafions and no experimenfal 
sfrucfure. Here, we consider fhe addifive model, [1] + A + B, consisfing of fhe overall con- 
sfanf and main effecfs of fhe fwo factors. In fhis case, esfimafes of fhe A and B main effecfs 
are fhe same regardless of whefher fhe ofher main effecf is fiffed. Likewise, fhe sums of 
squares for each main effecf term are fhe same whefher fhe model is specified as [1] + A + B 
or as [1] + B + A, as illusfrafed in Example 11. lA. The main effecf ferms, corresponding to 
facfors A and B, are fhus orfhogonal in fhis design. 

EXAMPLE ll.lA: BEETLE MATING 

Consider the 2x2 factorial beetle mating experiment described in Example 8.1 (data 
in file beetles.dat) where females from two species of willow beetle (factor Species) 
mated with males from either their own species (intraspecies mating) or the other 
species (interspecies mating, factor MateType). The response analysed was the logjQ- 
transformed number of eggs laid by each female. The parameter estimates for the main 
effects were derived in Example 8.1B (see Table 8.2) from the margins of a two-way table 
of observed treatment means. This derivation does not depend on the order in which the 
terms are fitted, or on which terms are fitted in the model. Table 11.1 shows the ANOVA 
tables for the explanatory component specified either as [1] + MateType + Species or 
with factors in the other order as [1] + Species + MateType. The sums of squares for 
each factor are the same for both orders, confirming that these factors are orthogonal. 

For two treatment factors, the easiest way to assess the orthogonality of fhe design is 
to obfain a fwo-way fable confaining counfs of replicates for each combinafion of fhe fwo 
facfors. The simplesf case of an orfhogonal design is where observafions are presenf in all 
cells wifh equal replicafion. If some cells are empfy, or if replicafion is unequal, fhen fhe 
facfors will usually (buf nof always) be non-orfhogonal. As a rule of fhumb, if all marginal 
means for one factor in fhe fwo-way fable involve equal represenfafion from levels of fhe 
ofher facfor, fhen fhe design will be orfhogonal. 

For fwo facfors, we can wrife down a mafhemafical condifion sufficienf for orfhogonalify 
(Mead ef al., 2012, Chapfer 7). If is fhe replicafion for fhe rfh level of fhe firsf facfor and 
fhe sfh level of fhe second facfor, fhen fhe fwo facfors are orfhogonal if, for all combina- 
fions of r and s. 



^rs 



n,. X n.s 
N 



( 11 . 1 ) 



TABLE 11.1 

ANOVA Tables for Main Effects Model Fitted in Two Different Orders for 
logjpfNumber of Eggs) in the Beetle Mating Experiment (Example ll.lA) 





Sequence 1 






Sequence 2 




Source of 




Sum of 


Source of 




Sum of 


Variation 


df 


Squares 


Variation 


df 


Squares 


MateType 


1 


0.3807 


Species 


1 


0.9031 


Species 


1 


0.9031 


MateType 


1 


0.3807 


Residual 


36 


0.9484 


Residual 


36 


0.9484 


Total 


39 


2.2322 


Total 


39 


2.2322 
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where n,,. = n^^ is the total number of observations for fhe rfh level of fhe firsf facfor, 

n.j = is fhe fofal number of observafions for fhe sfh level of fhe second facfor and, as 
usual, N is fhe fofal number of observafions. 



EXAMPLE ll.lB: BEETLE MATING 

Each of the four treatment combinations in the beetle mating experiment is replicated 
10 times, so = 10 for r = 1, 2 and s = 1, 2. The total count for each level of the individual 
factors is 20, giving m,. = 20 and n,^ = 20, with N = 40. Hence, the condition for orthogo- 
nality given in Equation 11.1 is satisfied as 



X n.s 

N 



20x20 „ , - 

= 10 for any r = 1, 2, s = 1, 2 . 

40 } , r , 



11.2 Fitting Models with Non-Orthogonal Terms 

In this section we demonstrate the process of fitting models and making statistical infer- 
ences for two non-orthogonal factors with a crossed treatment structure, paying partic- 
ular attention to steps where the procedure or inference differs from that described in 
Chapter 8 for orthogonal factors. We illustrate these differences by comparing the analysis 
of Example 11.1, which has an orthogonal structure, with that of Example 11.2, which has 
a non-orthogonal structure. Eor simplicity, we have chosen examples with no structural 
component, but the same principles apply to investigation of the explanatory component 
when structure is present, within the context of a multi-stratum ANOVA. 

EXAMPLE 11.2A: GENETICS OE ROOT GROWTH* 

An experiment was conducted to investigate the genetic component of root growth 
in manipulated lines. Two male parents (factor Male, levels Ml and M2) were crossed 
with five female parents (factor Female, levels F1-F5) and eight seeds were to be grown 
from each cross in a CRD. Root growth (maximum length) was measured (mm) after 
three weeks (variate Roof). Unfortunately, many of the seeds were not viable because 
of genetic incompatibilities, leading to reduced replication of some treatments with 
only 30 observations in total. The data are provided in file cross.dat and displayed in 
Table 11.2. 

For this two-way factorial the pattern of replication is without structure, as shown 
in Table 11.3 and the Male and Female factors are non-orthogonal. This can be verified 
using the condition presented in Equation 11.1. For example, consider the replication for 
offspring of male parent Ml with female parent FI, with Wjj = 6. The marginal replica- 
tion for male parent Ml is Wj. = 19 and the marginal replication for female parent FI is 
w.i = 8, and there are 30 observations (N = 30). Orthogonality then requires replication 
of 19 X 8/30 = 5.07, but this is not an integer value and so cannot equal the actual replica- 
tion, here Wn = 6, confirming that the structure is non-orthogonal. 

11.2.1 Parameterizing Models for Two Non-Orthogonal Factors 

Although sum-to-zero constraints are often used for balanced designs, this parameteriza- 
tion becomes much less convenient for non-orthogonal structures, and so it is more com- 
mon to use first-level-zero (or last-level-zero) constraints in this context. First-level-zero 
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TABLE 11.2 



Observed Root Growth (mm) from Offspring of Crosses between Five Female and Two Male 
Parents (Example 11.2A and File cross.dat) 



Female 

Parent 


Male 

Parent 


Root 

Growth 


Female 

Parent 


Male 

Parent 


Root 

Growth 


Female 

Parent 


Male 

Parent 


Root 

Growth 


F5 


Ml 


76 


F3 


M2 


68 


FI 


Ml 


83 


FI 


Ml 


83 


F4 


Ml 


81 


F4 


Ml 


84 


FI 


Ml 


85 


F3 


M2 


69 


F2 


Ml 


82 


F3 


Ml 


75 


F5 


M2 


77 


F2 


M2 


75 


FI 


Ml 


88 


F2 


M2 


78 


F5 


Ml 


77 


FI 


M2 


80 


F2 


Ml 


79 


F2 


M2 


77 


FI 


M2 


79 


F4 


M2 


80 


FI 


Ml 


84 


F4 


Ml 


83 


FI 


Ml 


89 


F3 


M2 


70 


F3 


Ml 


80 


F4 


Ml 


85 


F4 


Ml 


86 


F2 


Ml 


81 


F5 


Ml 


76 


F3 


M2 


70 



TABLE 11.3 



Replication of Parenfal Combinations for Germinafed Seed 
in the Root Growth Experiment (Example 11.2A) 







FI 


Female Parent 
F2 F3 F4 


F5 


Total 


Male 


Ml 


6 


3 


2 


5 


3 


19 


Parent 


M2 


2 


3 


4 


1 


1 


11 




Total 


8 


6 


6 


6 


4 


30 



parameterization was introduced for crossed models with two factors in Section 8.2.6, and 
is used throughout this chapter. Parameter estimates are again obtained by the method of 
least squares. 

In this section, we consider a two-factor crossed treatment structure for factor A with 
levels and factor B with fg levels. As a preliminary step, we examine the two-factor addi- 
tive model which excludes the interaction, i.e. [1] + A-i- B, to explain the parameterization 
and to demonstrate the difference between an orthogonal and a non-orthogonal structure. 
The additive model takes the general form 



yrsk M-11 T T|r T "h . 



First-level-zero constraints are imposed as rji = 0, = 0, and this model corresponds to 

Explanatory component: [1] -i- A -i- B 

In this parameterization, Pn is the overall constant associated with the term [1]. Because of 
the constraints, this constant represents the population mean under this additive model 
for a unit with the first level of both factors. The parameters q, (r = 1 ... f^) ^re associ- 
ated with factor A and can be thought of as the expected difference between observations 
with the rth and first levels of factor A for any given level of factor B. Similarly, param- 
eters (s = 1 ... tg), associated with factor B, represent the expected difference between 
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observations with the sth and first levels of facfor B for any given level of facfor A. (If fhe 
response fo fhe level of facfor A depends on fhe level of facfor B, or vice versa, fhen we 
should need fo include fhe inferacfion ferm, as described below.) If we simplify fhe model 
furfher, by dropping ouf facfor B, fhen fhis becomes 



y rsk M-1 "t tl?" "t £rsk / 



corresponding fo fhe explanafory sfrucfure [1] + A, wifh consfrainf r|j = 0. Here, we have 
relabelled fhe overall consfanf, associafed wifh ferm [1], as Pj because if now represenfs fhe 
populafion mean for fhe firsf level of facfor A. The paramefers r|,, (r = 1 ... f^) are now fhe 
expecfed difference befween observafions wifh fhe rfh and firsf levels of facfor A (regard- 
less of facfor B). An analogous model can be consfrucfed for facfor B wifh facfor A omiffed, 
i.e. explanafory sfrucfure [1 ] -i- B, as 



y rsk Pi "t “t ^rsk / 



wifh consfrainf = 0. The overall consfanf, associafed wifh ferm [1], now represenfs fhe 
populafion mean for fhe firsf level of facfor B, and paramefers (s = 1 • ■ • fg) now fhe 
expecfed difference befween observafions wifh fhe sfh and firsf levels of facfor B. Nofe 
fhaf fhe inferprefafion (and hence value) of pj changes according fo fhe ferms presenf in 
fhe model. 

If fhe factors A and B are orfhogonal, fhen fhe esf imafed paramefers associafed wifh each 
facfor ferm do nof change when fhe ofher facfor is added fo or dropped from fhe model, 
as illusfrafed in Example ll.lC. The same does nof hold for fhe consfanf ferm, which is 
marginal fo bofh ferms A and B: because fhe inferprefafion of fhe consfanf changes as 
ferms are added or dropped, so does ifs esfimafed value. (Nofe fhaf fhis was nof fhe case 
for fhe sum-fo-zero parameferizafion, in which fhe inferprefafion of the constant term as 
the overall mean was consistent across different models.) 

EXAMPLE ll.lC: BEETLE MATING 

Here, we obtain parameter estimates for first-level-zero parameterization (using the 
generic notation introduced above) for main effects models with one or both factors. 

The estimates for both single factor models and the two-factor additive model are listed 
in Table 11.4. Effects labelled as t] are associated with the MateType factor, and those 
labelled ^ are associated with the Species factor. 



TABLE 11.4 



Estimated Parameters for Several Models for the logio(Number of Eggs) in the Beetle Mating 
Experiment, Using First-Level-Zero Parameterization, with t|j = 0, = 0, (r|0rs = 0 for r = 1 or s = 1 

(Example ll.lC) 



Term 


Parameter 




Model 




[1] + M + S 


[1] + M 


[1] + S 


[1] + M*S 


[1] 


Pii or Pi 


1.561 


1.711 


1.658 


1.513 


MateType Intra 


h2 


0.195 


0.195 


- 


0.291 


Species P. vulg. 


Cz 


0.301 


- 


0.301 


0.396 


MateType Intra. Species P. vulg. 


(nOzz 


- 


- 


- 


-0.191 



Note: In models M = MateType, S = Species. 
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As each factor has only two levels, the estimated parameters represent the expected 
difference between the second and first levels of each factor. It is straightforward to 
verify that these figures are consistent with main effect estimates given under the sum- 
to-zero parameterization in Table 8.2c. For example, under first-level-zero parameteriza- 
tion the estimate of the effect t, 2 r associated with Species P. vulgatissima and equal to 
0.301, is the difference between the main effect estimates for Species under the sum-to- 
zero parameterization: 

Species^ - Species.^ = 0.1503 - ( -0.1503) = 0.301 . 

Since the MateType and Species factors are orthogonal, estimates associated with the 
individual factors are the same for both one-way models (i.e. models containing only 
one of these factors) and the additive model [1] + MateType + Species. These estimates 
are also unchanged if the order of the factors in the model is swapped to give model 
[1] + Species + MateType. As expected from its interpretation, the value of the estimated 
constant (labelled Pu or Pi) differs between models. 

EXAMPLE 11.2B: GENETICS OE ROOT GROWTH* 

We now repeat the analyses of Example ll.lC for this non-orthogonal data set, with the 
estimates obtained from both single factor models and the two-factor additive model 
listed in Table 11.5. Here, effects labelled as p are associated with the Female factor, and 
those labelled ^ are associated with the Male factor. 

In these models the parameter associated with the second male parent (M2), repre- 
sents the expected difference in root growth with respect to offspring of the first male 
parent (Ml) for a given female parent. The effect of the rth female parent (p,) represents 
the expected difference in root growth with respect to offspring of the first female par- 
ent (El) for a given male parent. The two factors Male and Female are non-orthogonal, 
and so the estimates associated with each factor change in value when the other factor 
is added to or dropped from the model. For example, the estimated effect of the second 
male parent (M2) equals -4.8 mm in the additive model containing both factors, indi- 
cating 4.8 mm less root growth for offspring of the second male parent when compared 
with the first male parent. But this estimate becomes -7.1 mm in a model containing the 



TABLE 11.5 



Estimated Parameters for Several Models for Root Growth, Using First-Level-Zero 
Parameterization, with Pj = 0, = 0, (pO,s = 0 for r = 1 or s = 1 (Example 11.2B) 



Term 


Parameter 




Model 




[1] + F + M 


[1] + F 


[1] + M 


[1] + M*F 


[1] 


Pii or Pi 


85.1 


83.9 


81.9 


85.3 


Female F2 


ri2 


-4.0 


-5.2 


- 


-4.7 


Female F3 


P3 


-9.9 


-11.9 


- 


-7.8 


Female F4 


P4 


-1.1 


-0.7 


- 


-1.5 


Female F5 


Ps 


-7.4 


-7.4 


- 


-9.0 


Male M2 


Cz 


-4.8 


- 


-7.1 


-5.8 


Female F2. Male M2 


(PQ 22 


- 


- 


- 


1.8 


Female F3. Male M2 


(nC)32 


- 


- 


- 


-2.4 


Female F4. Male M2 


(PQ 42 


- 


- 


- 


2.0 


Female F5. Male M2 


(PQ 52 


- 


- 


- 


6.5 



Note: In models F = Female, M = Male. 
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Male factor only, indicating a considerably larger difference. One must therefore estab- 
lish a suitable model before reliable inferences can be made. 

The full model for a crossed freafmenf sfrucfure wifh fwo facfors includes an inferacfion 
ferm and is wriffen as 



yrsk M-11 "i" (hOrs ^rsk • 



Firsf-level-zero consfrainfs are imposed as rji = 0, Ci = 0 ^rid (riQrs = 0 when r = 1 or s = 1. 
This model corresponds fo fhe crossed explanafory sfrucfure 

Explanatory componenf: [1] -i- A*B 

= [1]-tA-tB-tA.B 

In fhis parameferizafion, pjj is fhe overall consfanf associafed wifh fhe ferm [1], which now 
represenfs fhe populafion mean under the crossed model for the first level of both factors. 
Interpretation of the other parameters also differs somewhat from that in the additive 
models described above. The parameters r|,. (r = 1 ... f^), associated with factor A, repre- 
sent the difference between the rth and first levels of factor A at the first level of factor 
B. Similarly, parameters (s = 1 ... tg), associated with factor B, represent the difference 
between the sth and first levels of factor B at the first level of factor A. Because these param- 
eters now represent different quantities, they also now take different values. The param- 
eters (riQrs (r = 1 ... tj^, s = l ... t^) associated with the interaction term A.B can be thought 
of as deviations relative to the first row and column in the two-way table of unstructured 
treatment effects (see Section 8.2.6), and allow the response to a level of factor A to depend 
on the level of factor B, and vice versa. 

This change in the interpretation of model parameters according to which terms are 
present can make the first-level-zero parameterization confusing. It is important to realize 
that the parameterization used is only a tool to facilitate estimation, and that any valid 
parameterization for a given model results in the same set of fitted values or predictions. 
The predicted value for a given treatment combination (rth level of factor A, sth level of fac- 
tor B) is obtained by addition of the relevant parameter estimates, i.e. for the full crossed 
model 



= An +r\r+L+ (hOrs • 

These predictions for the full crossed model will always be equal to the observed treat- 
ment means. Predictions from the simple additive models are obtained in a similar man- 
ner by adding together the estimated parameters for the terms present in that model. 



EXAMPLE ll.lD: BEETLE MATING 

Column 6 of Table 11.4 lists estimates obtained from first-level-zero parameterization 
for the full crossed explanatory component, [1] + MateType*Species. The estimated con- 
stant changes when the interaction is added into the model, and is equal to the observed 
mean for the first level of both factors (MateType inter. Species P. vitellinae). Although the 
MateType and Species factors are orthogonal, estimates of the main effects also change 
when the interaction is added into the model as the interpretation of these parameters 
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TABLE 11.6 



Predicted Population Means from First-Level-Zero Parameterization for the Full Crossed 
Model for the Beetle Mating Experiment (Example ll.lD) 







Mating Type 






Interspecies 


Intraspecies 


Species of 
Female 


P. vitellinae 
P. vulgatissima 


[ill + % + ^1 + (tiQii = 1.513 
All + Az + Cl + (^Ozi — 1.909 


All + % + Cz + (^C)iz — 1.804 
All + Az + Cz + (^C)zz = 2.008 



changes. The effect associated with intraspecies mating (MateType Intra, estimate 0.291) 
is equal to the difference between observed means for intra- and interspecies mating 
for the first level of Species, i.e. P. vitellinae (calculated as 1.804 - 1.513). Table 11.6 dem- 
onstrates that predictions from the crossed model are equal to the observed treatment 
means, previously shown in Table 8.1a. 

EXAMPLE 11.2C: GENETICS OE ROOT GROWTH* 

Column 6 of Table 11.5 lists estimates from first-level-zero parameterization for the full 
crossed explanatory component, [1] + Male*Female. These estimates can be used to 
obtain the predictions shown in Table 11.7, which are equal to the observed treatment 
means. 

The estimated constant equals the observed mean for crosses derived from the first 
male parent (Ml) and the first female parent (FI) (see Table 11.5). Other parameters 
are interpreted as described above. For example, the estimated effect associated with 
female parent F3 (-7.8) equals the difference between the observed means for crosses 
derived from male parent Ml with female parents F3 or FI (77.5 - 85.3 = -7.8). 

Interpretation of parameters under first-level-zero constraints becomes increasingly 
more complex as higher-order interactions are added into the model. As a general rule, 
individual parameter estimates are not of parficular inferesf, excepf as componenfs of pre- 
dicfions. We use ANOVA fo defermine which model ferms are required fo give a good 
descripfion of a dafa sef, and fhis also gives an esfimafe of background variafion fhaf can 
be used fo esfimafe sfandard errors of paramefer esfimafes and predicfions. 



TABLE 11.7 



Predicted Population Means from First-Level-Zero Parameterization for the Full 
Crossed Model for the Root Growth Experiment (Example 11.2C) 







Male Parent 


Ml 


M2 


Female 


FI 


All + Hi + Cl + (^C)ii “ 


All + Ai + Cz + ('nC)iz — 79.5 


Parent 






- ^ 




F2 


All + Az + Cl (^C)zi — 80.7 


All + Az + Cz + ('OC)zz = 76.7 




F3 


All + As + Cl (nOsi — ^'7.5 


All + As + Cz + (tiC)sz = 69.3 




F4 


All + A 4 + Cl (nC)4i — 83.8 


M^ii + TI 4 + Cz + (nC)4z = 80.0 




F5 


All + As + Cl (^C)si — 76.3 


All + As + Cz + (^C)sz = 77.0 
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11.2.2 Assessing the Importance of Non-Orthogonal Terms: The Sequential 
ANOVA Table 

As in previous chapters, ANOVA is used to partition the total variation into components 
associated with individual model terms and background variation. We do not give details 
of how to calculate the ANOVA here; instead we obtain the tables directly from statisti- 
cal software. For non-orthogonal designs, the sums of squares in the ANOVA table may 
depend on the order in which terms are added into the model. It is therefore necessary to 
consider several sequences of sub-models that add terms into the full model in different 
orders. When constructing sequences of sub-models we respect the principle of marginal- 
ity (see Section 8.2.1) and add a term only if all possible sub-terms are already in the model. 

Consider a two-way crossed treatment structure with factors A and B, as defined above 
(Section 11.2.1). We consider the two sequences of sub-models shown in Figure 11.1, both 
of which start with the baseline model containing the overall constant. In Sequence 1, we 
first add factor A, then factor B, and then the A.B interaction term to get the full explana- 
tory component as [1] -i- A -i- B -i- A.B. In the second sequence, the roles of factors A and B 
are reversed to obtain [1] -i- B -i- A -i- A.B. The principle of marginality means that we cannot 
add the interaction A.B before either of the main effects A or B, and we must include the 
overall constant first, so only these two sequences are valid. 

Before proceeding, we need to introduce some new terminology. For any sub-model, 
we define its model sum of squares (ModSS) as the sum of squares accounted for by that 
sub-model. We identify a model sum of squares by explicitly specifying the sub-model it 
refers to within parentheses, so ModSS([1] -i- A -i- B) is the sum of squares associated with 
sub-model [1] -i- A -i- B. Similarly, the model df (ModDF) is defined as the total df associated 



Sequence 1 



[ 1 ] 

yrsk - ^ ^rsk 



Sequence 2 




[1] + A 

As* = N + V 



[1] + B 



+B 



+A 



V 



V 



[1] + A + B 



[1] + B + A 



+A.B 



+A.B 



V 



V 



[1J + A + B + A.B 

As* = i^ll + ’Ir + Cs + ("nOrs + As* 



[1J + B + A + B.A 

As* - M^ll Cs i^Qrs As* 



FIGURE 11.1 

Symbolic and algebraic forms of models for a two-way crossed structure obtained by sequentially adding one 
term at a time, respecting marginality. 
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with the sub-model. Note that, because the overall mean is eliminated from the total sum 
of squares and fhe df, ModSS([1]) = 0 and ModDF([1]) = 0. 

EXAMPLE ll.lE: BEETLE MATING 

The model sums of squares and df are listed in Table 11.8. In this orthogonal design, 
we can calculate the sum of squares for the additive model [1] + Species + MateType 
(ModSS = 1.2838) by adding together the sums of squares associated with the two sub- 
models: [1] + Species (ModSS = 0.9031) and [1] + MateType (ModSS = 0.3807). 

EXAMPLE 11.2D: GENETICS OF ROOT GROWTH* 

The model sums of squares and df are listed in Table 11.9. In this non-orthogonal case, 
the sums of squares for the additive model [1] + Female + Male (ModSS = 747.11) is less 
than that obtained by addition of the sums of squares associated with the two sub- 
models [1] + Female (ModSS = 610.62) and [1] + Male (ModSS = 354.08). 

The ModSS, ModDF and parameter estimates for fwo (sub-)models confaining fhe same 
ferms are always idenfical, regardless of fhe order of fiffing. Flence, for a fwo-way crossed 
freafmenf sfrucfure, fhe sum of squares and df for fhe fwo versions of fhe full model, 
[1] + A*B and [1] + B*A, are equal. Similarly, fhe model SS, df and esfimafes for fhe addifive 
model [1] + A + B are fhe same as for model [1] + B + A. 

An ANOVA fable for each sequence of sub-models, called a sequential ANOVA table, 
can be derived from fhe model sums of squares and df . When a new ferm is added info fhe 
model, changes in fhe ModSS and ModDF are affribufed fo fhaf new ferm. These changes 
are called fhe incremental or sequential sum of squares and df, respecfively. To avoid 
ambiguify, fhe incremenfal sum of squares and degrees of freedom (denofed as SS and 
df ) are labelled by bofh fhe new ferm added and ferms already in fhe model. For example, 
moving from model [1] + A fo [1] + A + B we describe fhe change as +B | ([1] + A), fo be read 
as 'adding facfor B given fhaf fhe overall mean and facfor A are already in fhe model', or 



TABLE 11.8 

Model df (ModDF) and Sums of Squares (ModSS) for a Crossed Model and Its Sub-Models for the 
Beetle Mating Experiment (Example 11. IE) 



Sequence 1 




Sequence 2 




Model 


ModDF 


ModSS 


Model 


ModDF 


ModSS 


[1] + MateType 


1 


0.3807 


[1] + Species 


1 


0.9031 


[1] + MateType + Species 


2 


1.2838 


[1] + Species + MateType 


2 


1.2838 


[1] + MateType'Species 


3 


1.3751 


[1] + Species*MateType 


3 


1.3751 



TABLE 11.9 

Model df (ModDF) and Sums of Squares (ModSS) for a Crossed Model and Its Sub-Models for the 
Root Growth Experiment (Example 11.2D) 





Sequence 1 






Sequence 2 




Model 


ModDF 


ModSS 


Model 


ModDF 


ModSS 


[1] + Femaie 


4 


610.62 


[1] + Male 


1 


354.08 


[1] + Femaie + Male 


5 


747.11 


[1] + Male + Female 


5 


747.11 


[1] + Female*Male 


9 


788.78 


[1] + Male*Female 


9 


788.78 
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TABLE 11.10 



Incremental Sums of Squares for 


a Two-Way Crossed Structure Fitted as [1] + A + B + A.B 


Model 


Change 


Incremental Sum of Squares 


[1] + A 
[1] + A + B 
[1] + A*B 


+A I [1] 

+B 1 ([1] + A) 

+A.B I ([1] + A+B) 


SS(+A I [1]) = ModSS([1] + A) - ModSS([1]) 

SS(+B I [1 ] + A) = ModSS([1 ] + A + B) - ModSS([1 ] + A) 

SS(+A.B I [1] + A + B) = ModSS([1 ] + A*B) - ModSS([1 ] + A + B) 



TABLE 11.11 

Structure of the Sequential ANOVA Table for a Two-Way Crossed Structure Fitted as 
[1] + A+B + A.B 



Term Added 


Incremental Sum 
of Squares 


Incremental 

df 


Mean Square 


Variance Ratio 


+ A 


SS(+A) 


DF(+A) 


MS(+A) = SS(+A)/DF(+A) 


MS(+A) /ResMS 


+ B 


SS(+B) 


DF(+B) 


MS(+B) = SS(+B)/DF(+B) 


MS(+B)/ResMS 


+ A.B 


SS(+A.B) 


DF(+A.B) 


MS(+A.B) = SS(+A.B)/DF(+A.B) 


MS(+A.B)/ResMS 


Residual 


ResSS 


ResDF 


ResMS = ResSS /ResDF 




Total 


TotSS 


TotDF 







equivalently, 'adding factor B after eliminating the overall mean and factor A'. The forms of 
fhe incremental sums of squares for model sequence [1] + A + B + A.B are shown in Table 
11.10. The incremenfal df are derived similarly from the ModDF. 

In the context of a sequenfial ANOVA fable, we can use some abbreviations by consider- 
ing the table as a whole. For example, instead of lisfing fhe change and the terms already 
present in the model, we can deduce the terms already in the model from previous lines 
in the ANOVA table and just indicate the change by using ' -i- ' with the name of the term 
added into the model. For example, in Table 11.10, SS(-i-B) and DF(-nB) can be used as short- 
hand to indicate SS(+B | [1] + A) and DF(-i-B | [1] + A), respectively. 

We can now derive a full sequenfial ANOVA fable. The residual sum of squares (ResSS) 
and df (ResDF) are fhose associafed wifh full model, so here ResSS = TotSS - ModSS([1] -i- 
A*B), and ResDF = N - 1 - ModDF([1] -i- A*B). Mean squares are calculated by division of fhe 
incremenfal sums of squares by fheir incremenfal df. The designs considered here have no 
sfrucfure, so the variance ratios for each term are all calculated with respect to the ResMS. 
The structure of one sequenfial ANOVA table for a two-way crossed structure is shown in 
Table 11.11. The first row gives the change on addition of ferm A fo fhe baseline model con- 
faining only fhe overall consfanf ferm [1], The second row corresponds fo the change when 
term B is added to a model that already contains [1] + A, and is often described as 'the sum 
of squares for B after eliminating A'. The third row corresponds to the change when the 
interaction is added to a model containing both main effects ([1] + A-i- B). An analogous 
table can be derived for the other sequence. For each line of the ANOVA table, the variance 
ratio can be used as a test statistic for fhe null hypofhesis thaf addif ion of the term gives no 
improvement to the current model (explains no additional variation). The df for fhe associ- 
afed F-fesf are, as usual, given by fhe df associated with the numerator and denominator 
mean squares of the variance ratio. 

EXAMPLE ll.lE: BEETLE MATING 

The two sequential ANOVA tables for the full crossed model are in Table 11.12. In this 
orthogonal design, the incremental sums of squares are the same in both sequences. 
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TABLE 11.12 



Sequential ANOVA Tables for a Crossed Model for the Beetle Mating Experiment (Example ll.lF) 
Sequence 1 Sequence 2 



Source 


df 


SS 


MS 


VR 


Source 


df 


SS 


MS 


VR 


+ M 


1 


0.3807 


0.3807 


15.992 


+ S 


1 


0.9031 


0.9031 


37.932 


+ S 


1 


0.9031 


0.9031 


37.932 


+ M 


1 


0.3807 


0.3807 


15.992 


+ M.S 


1 


0.0914 


0.0914 


3.837 


+ S.M 


1 


0.0914 


0.0914 


3.837 


Residual 


36 


0.8571 


0.0238 




Residual 


36 


0.8571 


0.0238 




Total 


39 


2.2322 






Total 


39 


2.2322 







Note: M denotes factor MateType and S denotes factor Species, df = incremental df, SS = incremental sum of 
squares, MS = mean square, VR = variance ratio. 



Eor example, the sum of squares for Species eliminating MateType (i.e. +Species in 
Sequence 1) is the same as that for Species ignoring MateType (+Species in Sequence 
2), and the conclusions are exactly the same for both model sequences. 

EXAMPLE 11.2E: GENETICS OE ROOT GROWTH* 

The two sequential ANOVA tables are in Table 11.13. In this non-orthogonal design, the 
incremental sum of squares on addition of the Male main effect (denoted M in Table 
11.13) to the model differs according to whether the Female main effect (denoted F in 
Table 11.13) has already been added to the model (eliminated) or not (ignored). A similar 
pattern is present for the Female main effects but, as expected, the incremental sum 
of squares for the interaction term (Female.Male) is the same in both sequences. The 
conclusions are the same from both sequences: there is some evidence of an interaction 
(pr.M = F“ 2 o = 2.821, P = 0.053), but there is very strong evidence from both sequences 
that both main effects are required in the model. 

In some cases, interpretation of the ANOVA table is less straightforward. For example, 
suppose that observed significance levels from the ANOVA table for a two-way crossed 
model, [1] + A*B, took the values listed in Table 11.14. In this case. Sequence 1 gives strong 
evidence that factor A accounts for variation in the response but that, once this term has 
been taken into account, addition of factor B and the interaction term into the model 
accounts for no further variation. In contrast. Sequence 2 indicates that when factor A is 
ignored, there is evidence that factor B accounts for variation in the response, although the 
interaction is still not significant. We need to put this information together in a way that 



TABLE 11.13 

Sequential ANOVA Tables for a Crossed Model for the Root Growth Experiment (Example 11. 2E) 
Sequence 1 Sequence 2 



Source 


df 


SS 


MS 


VR 


Source 


df 


SS 


MS 


VR 


+ F 


4 


610.62 


152.66 


41.324 


+ M 


1 


354.08 


354.08 


95.849 


+ M 


1 


136.48 


136.48 


36.945 


+ F 


4 


393.02 


98.26 


26.597 


+ F.M 


4 


41.68 


10.42 


pF.M ^ 2.821 


+ M.F 


4 


41.68 


10.42 


pM.F = 2.821 


Residual 


20 


73.88 


3.69 




Residual 


20 


73.88 


3.69 




Total 


28 


862.67 






Total 


28 


862.67 







Note: M denotes factor Male and F denotes factor Female, df = incremental df, SS = incremental sum of squares, 
MS = mean square, VR = variance ratio. 
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TABLE 11.14 

Observed Significance Levels (P) from Sequential ANOVA 
Tables for a Two-Way Crossed Model with Factors A and B 



Sequence 1 




Sequence 2 




Term Added 


P 


Term Added 


P 


+ A 


0.01 


+ B 


0.04 


+ B 


0.08 


+ A 


0.02 


+ A.B 


0.32 


+ B.A 


0.32 



makes sense. The fact that B is significant only when A is not in the model suggests that 
there is some association, or confounding, between levels of A and B. 

In general, we prefer fhe simplest model that describes variation in the response - a par- 
simonious model. Here, this would be the model that contains factor A only, as Sequence 1 
tells us that we do not need factor B or the interaction once we have factor A in the model. 
In selecting a model we also obey the principle of marginality; this implies that we work 
upwards from the bottom of the ANOVA table(s), as described in Section 8.3. 

This process of model selection is reasonably straightforward for a model with two 
treatment factors, but becomes more complex when more factors and their interactions are 
present. The principle of marginality requires that we should fit the main effects before 
two-factor interactions, two-factor interactions before three-factor interactions and so on, 
but this may still result in a large number of valid sequences of sub-models to be compared 
(see Section 8.3). For this reason, different types of sum of squares have been developed to 
aid in model identification, and these are described in the next subsection. 



11.2.3 Calculating the Impact of Model Terms 

The incremental sums of squares described above, i.e. the change in the model sum of 
squares on addition of a new term into the model, are sometimes called the Type I SS. 
These sums of squares are widely used but, as shown above, they have the disadvantage 
that the value of the Type 1 SS for a given term changes according to the order in which the 
model terms are specified. 

The Type II SS for a term is usually defined as the incremental sum of squares obtained 
when that term is added to a model that contains all terms marginal to itself. For example, 
consider a three-way crossed model containing all main effects and interactions for factors 
A, B and C. The Type II SS for the term A.B is then the incremental sum of squares obtained 
when term A.B is added to the model [1] -i- A -i- B. Sometimes the Type II SS is alternatively 
defined as the incremental sum of squares obtained when the term is added to a model 
that contains all other terms of lower or equal order. Using this second definition, the Type 
II SS for term A.B in our three-factor example is the incremental sum of squares obtained 
from adding the term A.B to the model [1] -i- A -i- B -i- C -i- A.C -i- B.C. Under both definitions, 
the Type II SS for term A. B.C would be the incremental sum of squares obtained when 
the three-way interaction is added to a model containing all main effects and two-factor 
interactions, i.e. [1]-i-A-i-B-i-C-tA.B-i- A.C -i- B.C. These sums of squares can be useful in 
helping to establish a sensible model without having to refit terms in different orders, but 
they are not available in some statistical software. 

The Type III SS (sometimes also called marginal or drop-one-out SS) are more complex, 
but broadly correspond to the change in the model sum of squares obtained when a term 
is dropped from the full model. Type IV SS are similar to Type III SS in principle, but use 
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a slightly different calculation when there is no data for some freafmenf combinafions. 
Furfher defails of Type III and IV sums of squares can be found in Milliken and Johnson 
(2001). The Type I and Type III SS are always equal for fhe lasf ferm added info fhe model. 
We endorse fhe use of Type III SS whilsf respecfing fhe principle of marginalify, as Type 
III SS may be inappropriafe for any ferm fhaf is a sub-ferm of one or more ofher ferms in 
fhe model. We fherefore should nof use Type III SS fo fesf ferms fhaf are marginal fo ofher 
ferms presenf in fhe model. For example, we should nof calculafe Type III SS for factor A 
whilsf ferm A.B is presenf in fhe model. 



11.2.4 Selecting the Best Model 

The process of model selecfion applies fo fhe explanatory componenf only; fhe sfrucfural 
componenf is used fo obfain fhe correcf sfrafa and fesfs and so sfrucfural ferms should 
never be dropped. Model selecfion for fhe explanafory componenf can proceed based on 
Type III SS, respecfing marginalify, as fhese allow us fo sfarf wifh fhe full model and fo drop 
ferms progressively. This process was described in Secfion 8.3 for an orfhogonal fhree-way 
crossed sfrucfure. In fhaf orfhogonal case, fhe Type I and Type III SS are equivalenf, and all 
fesfs could be obfained from a single ANOVA fable. For non-orfhogonal sfrucfures, fhe pro- 
cedure is somewhaf more complex. Af each sfep, we use Type III SS fo fesf all model ferms 
fhaf are nof marginal fo ofher ferms sfill presenf in fhe model, i.e. only ferms fhaf are nof 
confained wifhin anofher ferm. For a fully crossed model, fhis means fhaf, as a firsf sfep, 
we can fesf only fhe highesf-order inferacfion. This process was illusfrafed in Figure 8.4. 
Af each sfep, if any of fhe fesfed ferms is nof significanf, fhen fhe leasf significanf (largesf 
observed significance level) can be dropped from fhe model. As each ferm is dropped, fhe 
model is refitted, and fhe process confinues unfil no furfher ferms can be dropped. 

Because fhis process becomes more complex as fhe number of factors increases, some- 
fimes aufomafic model selecfion procedures, such as sfepwise selecfion, are advocated. 
These mefhods are discussed in defail in Secfion 14.9.1 in fhe confexf of regression models 
wifh quanfifafive explanafory variables and in Secfion 15.6 for models wifh facfors. Flere, 
we merely nofe fhaf fhese mefhods musf respecf marginalify fo be valid, and fhaf fhe pro- 
cedure described above is equivalenf fo backward eliminafion. Forward selecfion is imple- 
mented by addifion of ferms based on fhe incremenfal sums of squares, again respecfing 
marginalify, and fhis mefhod may be inadvisable if inferacfions are presenf in fhe absence 
of main effecfs, as fhe procedure will fhen stop too soon. In pracfice, fhe sfepwise selecfion 
algorifhms implemented in sfafisfical soffware do nof usually respecf marginalify, and so 
some infervenfion will often be required. 



11.2.5 Evaluating the Response to Treatments: Predictions from the Fitted Model 

Recall that prediction is the use of the fitted model to estimate functions of the explanatory 
variable(s). For example, we might want to predict the population mean for a given experi- 
mental treatment. In Section 8.2.4, for orthogonal structures, we used observed means to 
predict the effect of one factor whilst averaging across levels of other factors in the model. 
For a non-orfhogonal structure this approach is not usually efficient, and is not possible 
when some treatment combinations are missing. A more general approach is therefore 
required, which also gives us the opportunity to consider other types of model predictions. 

All predictions are based on the final fitted model. As we have already seen in Section 
11.2.1, once a model has been selected, the predicted value for any treatment combina- 
tion is equal to its fitted value based on this model. The full set of model predictions can 
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be presented in a multi-way table classified by the treatment factors. To produce predic- 
tions for individual facfors, or combinafions of facfors, we can fake marginal means of fhis 
mulfi-way fable. As in Chapfer 8, we fake accounf of fhe form of fhe final model: where 
significanf inferacfions are presenf, we usually make predicfions for combinafions of fac- 
fors rafher fhan individual facfors. 

EXAMPLE 11.2F: GENETICS OE ROOT GROWTH* 

In Example 11.2E, the interaction term was not quite significant for = 0.05 (P = 0.053), 
and the interaction mean square was much smaller than those for main effects, and 
so we decide to omit the interaction from the model. The explanatory structure of the 
selected model thus takes the form: [1] + Female + Male. Table 11.15 contains the predic- 
tions from this model for offspring of all possible crosses, formed as 

|irs = hll + fir + , 

using the estimates shown in column 3 of Table 11.5. Because the interaction term has 
been omitted from the final fitted model, the predictions are not equal to the observed 
treatment means listed in Table 11.7, although the differences are small since the inter- 
action effects were also small. The precision of the predictions, represented by their 
standard errors, reflects the amount of information available for the individual treat- 
ment combinations. 

Figure 11.2 plots these predictions, joining points by lines to make the pattern clear. In 
this additive model the lines are parallel, demonstrating that the expected difference in 
root growth between the two male parents is the same for each female parent. We can 
therefore simplify our summary of the model by taking marginal means of the predic- 
tions with respect to each factor separately; these predictions are also listed in Table 
11.15. For example, the marginal predictions for females El and F3 show that offspring of 
female F3 tend to have on average 9.9 mm less root growth than offspring of female El. 

As well as using predictions to summarize the fitted model, we can also use them to 
estimate the expected outcome for specific scenarios. In fhis case, we mighf fake mar- 
ginal means even when inferacfions are presenf, and we mighf also use weighfs when 
faking marginal means. For example, consider an experimenf done fo evaluafe fhe yield 
pofenfial of several varieties (factor Variety) in several regions of a country (factor Region), 
with the aim of predicfing pofenfial yields under various scenarios of variefy alloca- 
fion. The experimenfal resulfs can be summarized by use of the explanatory component 
[1] + Variety*Region. From the fitted model, we can produce a two-way table of predicfions 

TABLE 11.15 



Predictions with Standard Errors (SE) and Replication {nO from Explanatory Component 
[1] + Male + Female for the Root Growth Experiment (Example 11. 2F) 



Female 


Male Ml 




Male M2 




Margin 




Prediction 


SE 


«ls 


Prediction 


SE 




Prediction 


SE 


n.s 


FI 


85.1 


0.81 


6 


80.3 


1.03 


2 


82.7 


0.81 


8 


F2 


81.1 


1.00 


3 


76.3 


1.00 


3 


78.7 


0.90 


6 


F3 


75.2 


1.08 


2 


70.4 


0.94 


4 


72.8 


0.91 


6 


F4 


84.0 


0.91 


5 


79.2 


1.17 


1 


81.6 


0.94 


6 


F5 


77.7 


1.12 


3 


72.9 


1.29 


1 


75.3 


1.12 


4 


Margin 


80.6 


0.53 


19 


75.8 


0.70 


11 


78.2 


0.43 


30 
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FIGURE 11.2 

Predicted root length (mm) for offspring of each cross of two male (o Ml, • M2) and five female (F1-F5) parents 
from explanatory component [1] + Male + Female (Example 11.2F). 



for the yield of each variety in each region. If we wish to compare expected regional yields, 
then we need somehow to form averages of the predicted means across the varieties within 
each region. If we simply take marginal means of the two-way table, then we are taking 
averages by using equal weights for each variety in each region. This may be unrealistic 
if the area suitable for each variety varies across regions. Therefore, we could use weights 
based on the expected area of each variety grown within each region, allowing this to 
vary across regions. Many different weighting schemes are possible, but the weights used 
should be appropriate to the type of prediction desired. 

The process is further complicated if some of the predictions for individual treatment 
combinations cannot be estimated. This can happen when interactions are significant and 
so retained in the model but some of the corresponding treatment combinations are absent, 
for example if some varieties are not grown in some of the regions. In this case, it is not 
possible to obtain a reliable prediction for the missing variety x region combinations and 
so the multi-way table of predictions contains missing values. One can obtain marginal 
means by assigning zero weight to these missing combinations, but this action obviously 
affects both the composition and interpretation of the resulting predictions. Some thought 
is required to form appropriate predictions in this situation and some of these issues are 
discussed by Lane and Nelder (1982). 

Any of the predictions described above can be written as a linear combination of the 
model parameters. Some algorithms produce predictions that are formulated as marginal 
means of the multi-way table of predictions classified by all factors in the model. Other 
algorithms require direct specification of the coefficients in the required linear combina- 
tion; many statistical packages allow specification in either form. 



11.3 Designs with Planned Non-Orthogonality 

The previous sections have considered non-orthogonality between treatment factors and 
the problems that this can cause. Non-orthogonality can also occur between blocking and 
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treatment factors and this again complicates the statistical analysis. Many of fhe designs 
we have considered so far avoid fhese complicafions by ensuring fhaf block facfors are 
orfhogonal fo freafmenf facfors. For example, in fhe RCBD, LS and SP designs, fhe number 
of unifs per block is equal fo fhe number of freafmenfs, so fhaf each freafmenf occurs once 
in each block and fherefore, using fhe criferia of Secfion 11.1, fhe block and freafmenf fac- 
fors are orfhogonal. We have analysed fhese designs using mulfi-sfrafum ANOVA. When 
blocks and freafmenfs are non-orfhogonal, fhe mulfi-sfrafum ANOVA can be sensibly 
consfrucfed only if fhe design is balanced. Recall fhaf a design is called balanced if all 
freafmenf comparisons (differences) are esfimafed wifh fhe same precision (Secfion 3.2). 
Orfhogonal designs wifh equal replicaf ion are balanced, because all freafmenfs occur once 
in each block and fhus all comparisons have fhe same precision, i.e. equal SEDs. The BIBD 
(see Secfion 9.3) is an example of a design wifh a non-orfhogonal sfrucfure befween block- 
ing and freafmenf facfors, as if uses blocks wifh fewer experimenfal unifs fhan fhe num- 
ber of freafmenfs. This design creafes balance by ensuring fhaf, when considered across 
fhe whole experimenf, each pair of freafmenfs occurs fogefher equally offen wifhin fhe 
same blocks. Moreover, a cerfain proper f ion of fhe informafion on freafmenf differences 
can be obfained from infra-block comparisons, wifh fhe remainder being obfained from 
infer-block comparisons, and fhese proporfions are fhe same for any freafmenf differ- 
ence. This decomposifion of fhe informafion on freafmenfs can be summarized in a mulfi- 
sfrafum ANOVA fable, as described in Secfion 9.3. The same principle can be exfended fo 
analyse parfially balanced incomplefe block designs (Secfion 9.3.3) wifhin fhe framework 
of mulfi-sfrafum ANOVA. However, for a given block size, obfaining balanced (or par- 
fially balanced) incomplefe block designs becomes more challenging as fhe number of 
experimenfal freafmenfs increases. This is parficularly relevanf wifh facforial sfrucfures, 
because of fhe large number of freafmenf combinafions fhaf may be generafed, and fhere 
are fwo classes of design fhaf exploif planned non-orfhogonalify fo reduce fhe resources 
required. Fractional factorial designs (Section 11.3.1) use a carefully chosen subsef of fhe 
full sef of freafmenf combinafions fo provide informafion abouf main effecfs and low- 
order inferacfions wifhin a reduced number of experimenfal unifs, offen wifh low levels of 
replicafion. Factorial designs with confounding enable fhe efficienf allocafion of facforial 
freafmenf combinafions fo small blocks (Secfion 11.3.2). 

11.3.1 Fractional Factorial Designs 

Fractional Factorial Designs (FFDs) are useful where the number of factorial treatment 
combinations is considered too large for full replication of all combinations, possibly 
because the number of treatment combinations is substantially larger than the number 
of experimental units in the natural blocking structures, and where interest is focussed 
on main effects and low-order interactions. The construction of a FFD involves the a priori 
assumption that some high-order interaction effects will be negligible, so that these can 
be aliased with the main effects and low-order interactions that are of interest. This usu- 
ally requires some prior detailed knowledge and experience of the system under study. 
Given the natural block size, the challenge is then to identify an aliasing structure (i.e. the 
sets of high-order interaction effects that will be aliased with each main effect and low- 
order interaction effect) that allows all of the main effects and selected low-order interac- 
tions to be estimated. The aliasing structure defines the subset of treatment combinations 
to be used in the design. Often replication will still be possible, and the same subset of 
treatments will usually be repeated in each replicate block. Replication provides the main 
basis for the estimation of the background variation, although any estimates of high-order 
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interactions can also be assigned to the estimation of the background variation, given that 
we have assumed that these effects are negligible. Further exploration of FFDs is beyond 
fhe scope of fhis book, buf defails can be found in Mead ef al. (2012, Chapfer 14). 



11.3.2 Factorial Designs with Confounding 

Whefher fhe available resources allow replicafion of fhe full sef of factorial combinafions, 
or jusf some fracfional subsef (as discussed in Secfion 11.3.1), fhe nafural block size may 
sfill be too small to include fhe full sef, or subsef, of factorial combinations. In this case, 
the approach of factorial designs with confounding can be used, which divides fhe full 
sef, or subsef, of factorial combinafions info groups to be assigned fo different blocks. As 
with the FFD discussed in Section 11.3.1, we have to make the a priori assumption that 
some high-order interaction effects are negligible, and we can then confound fhese effecfs 
wifh fhe differences befween blocks. This means fhaf if is nof possible fo idenfify whefher 
block differences are due fo sfrucfural variabilify or fhe confounded high-order inferac- 
fion effect (assumed to be negligible). In this scenario, it is often necessary to assign some 
high-order interactions (assumed negligible) to provide our estimate of fhe background 
variafion, fhough replicafion may also provide some informafion. The sfrafegy for choos- 
ing fhe ferms fo be confounded wifh blocks depends on fhe relafionship befween fhe num- 
ber of facforial treafmenf combinafions included and fhe nafural block size, and on fhe 
comparisons of mosf inferesf. Furfher explorafion of fhis design approach is beyond fhe 
scope of fhis book, buf, again, furfher defails can be found in Mead ef al. (2012, Chapfer 14). 



11.4 The Consequences of Missing Data 

Missing responses occur frequently in scientific sfudies, somefimes because of an error in 
fhe experimenfal procedure so fhat a planned observafion eifher has nof been obfained 
or is identified as unreliable. Occasionally mosf, or even all, of fhe observafions associ- 
afed wifh one or more freafmenf combinafions are missing. This may happen because of 
experimenfal error, buf is also likely fo occur where fhe treafment is parfly or wholly unvi- 
able. Diagnosfic checks (see Secfion 5.2) may also lead fo observations being identified as 
oufliers and omitfed from analysis. 

If fhe missing responses can be considered fo occur af random independenfly of fhe 
freafmenf group or fhe (unseen) response, fhe paffern is known as missing completely at 
random (MCAR, for example, see Carpenfer and Kenward, 2013). For example, if machine 
breakdown causes several field plof yields fo be losf, fhis is likely fo be unrelated fo fhe 
freafmenf or fhe yields. If fhe missing responses can be considered fo occur at random 
within each treatment group but independently of fhe (unseen) response, fhe paffern is 
known as missing at random (MAR). For example, suppose a study on seedling vigour 
uses 50 seeds from several varieties and measures biomass after seven days. Germination 
rates vary between varieties but are not thought to be related to seedling vigour. The val- 
ues of biomass observed (condif ional on germinaf ion) can fhen be considered as missing af 
random. In eifher of fhese cases (MCAR or MAR), if is valid fo analyse fhe set of observed 
responses only and ignore fhe missing observations. If fhe patfern of missing responses is 
direcfly related fo fhe (unseen) value of fhe response, fhen fhe paffern is known as miss- 
ing not at random (MNAR, or sometimes NMAR). Flere, missing responses cannot be 
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ignored without the introduction of potential bias, and so it is not valid to analyse only the 
observed responses. For example, some machines have a lower limit of defecfion (LOD), 
below which responses may be recorded as missing. Because fhe allocafion as missing is 
relafed direcfly fo fhe response (i.e. small responses are more likely fo be missing) fhese 
dafa are MNAR. Ignoring fhese low values leads fo an over-esfimafe of fhe response, and 
fhis bias is larger for freafmenf groups wifh larger proporfions of observafions below fhe 
LOD. In fhis specific case, fhe missingness is relafed fo censoring, and fechniques fo adjusf 
for censoring are available (see Taylor, 1973). Anofher example occurs wifh fhe defecfion of 
oufliers, which usually depend on bofh fhe freafmenf group and fhe observed response. 
Observafions omiffed because fhey have been idenfified as an ouflier should fherefore be 
considered as MNAR. This inferprefafion highlighfs fhe pofenfial dangers associafed wifh 
removal of oufliers, and fhe limifafions of any analysis excluding such oufliers. In gen- 
eral, if is necessary fo use more advanced modelling fechniques fo deal wifh observafions 
MNAR (Carpenfer and Kenward, 2013). 

As sfafed above, where observafions are MCAR or MAR, fhe subsef of non-missing 
observafions can be analysed. Buf, excepf in fhe case of an unsfrucfured dafa sef (e.g. 
CRD) wifh a single freafmenf facfor, fhe presence of missing values usually leads fo non- 
orfhogonalify, even if fhe original design was orfhogonal. Several algorifhms have been 
developed fo preserve orfhogonalify for fhe case of a few missing values in orfhogonal 
designs. One example is fhe algorifhm of Healy and Wesfmacoff (1956), where missing 
observafions are esfimafed (by an iferafive procedure) af fhe value of fhe freafmenf group 
mean, which resulfs in a zero residual for fhe observafion. Wifh fhis algorifhm, fhe TrfSS 
is inflafed buf fhe ResSS is nof. The ANOVA fable is fhen approximafe, and fhe residual df 
musf be adjusfed (i.e. reduced) fo accounf for fhe missing observafions. Esfimafes of freaf- 
menf effecfs and means are correcf, buf fheir sfandard errors do nof fake proper accounf of 
fhe missing dafa and so will be under-esfimafed. Use of fhis type of algorifhm will offen 
be safisfacfory when fhe proporfion of missing values is small, buf nof when a larger pro- 
porfion of fhe observafions are missing. Beware fhaf some sfafisfical soffware applies fhis 
fype of algorifhm aufomafically when missing values are presenf in fhe dafa. 



EXAMPLE 11.3: ELISA CALIBRATION 

A calibration experiment was done to establish a suitable protocol for an experimental 
procedure. Three methods of preparation (factor Prep) were tested in combination with 
four different initial concentrations (factor Cone), with two replicates of each combina- 
tion. The solutions were applied in randomized order to an ELISA plate and processed. 
The measured absorbances (variate Absorbance) are listed in Table 11.16 and stored in 

file CALIBRATE.DAT. 

This is a CRD so there is no structural component. The appropriate explanatory com- 
ponent is crossed, i.e. [1] + Prep'Conc, with specific interest in the interaction term: if 
this is large, then the preparation method has a differential effect on the response that 
depends on the concentration. The readings were transformed to logarithms before 
analysis. One reading (unit 9) was deemed invalid because of suspected contamination 
and set missing. ANOVA tables obtained with either the Healy-Westmacott algorithm 
or with missing responses ignored are in Table 11.17 and, in this case, the differences 
between the two analyses are small. 

The Healy-Westmacott algorithm preserves the orthogonal structure, and so the 
ANOVA table is invariant to the order of the terms. This is not the case when missing 
values are omitted, but the degree of non-orthogonality for only one missing value is 
small, and hence the alternative sequential ANOVA table obtained by adding the factors 
in the other order is very similar to that shown. 
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TABLE 11.16 



Absorbances (Abs) from the ELISA Calibration Study with Four Concentrations of Substrate (Cone) 
and Three Preparation Methods (Prep) (Example 11.3 and File calibrate.dat) 



Unit 


Prep 


Cone 


Abs 


Unit 


Prep 


Cone 


Abs 


Unit 


Prep 


Cone 


Abs 


1 


1 


3 


0.482 


9 


2 


3 


- 


17 


3 


4 


0.056 


2 


1 


1 


0.783 


10 


1 


3 


0.431 


18 


2 


4 


0.073 


3 


2 


1 


1.014 


11 


3 


1 


1.001 


19 


3 


2 


0.808 


4 


2 


1 


1.038 


12 


3 


4 


0.048 


20 


3 


2 


0.888 


5 


1 


4 


0.092 


13 


1 


4 


0.130 


21 


1 


2 


0.780 


6 


3 


1 


0.784 


14 


2 


2 


0.707 


22 


1 


2 


0.759 


7 


3 


3 


0.327 


15 


1 


1 


0.766 


23 


3 


3 


0.364 


8 


2 


2 


0.745 


16 


2 


3 


0.412 


24 


2 


4 


0.070 



Source: Data from Rothamsted Research. 



TABLE 11.17 

Sequential ANOVA Table for the ELISA Calibration Study with One Missing Observation Using 
Either the Healy-Westmacott Algorithm or Ignoring the Missing Responses (Example 11.3) 



Healy-Westmacott Ignoring Missing Responses 



Source 


df 


SS 


MS 


VR 


P 


SS 


MS 


VR 


P 


+ Prep 


2 


0.161 


0.080 


7.30 


0.010 


0.159 


0.079 


7.21 


0.010 


+ Cone 


3 


23.512 


7.837 


711.90 


< 0.001 


23.506 


7.835 


711.74 


< 0.001 


+ Prep.Conc 


6 


0.583 


0.097 


8.83 


0.001 


0.583 


0.097 


8.82 


0.001 


Residual 


11 


0.121 


0.011 






0.121 


0.011 






Total 


22 


24.369 








24.369 









Note: df = incremental df, SS = incremental sum of squares, MS = mean square, VR = variance ratio, P = observed 
significance level. 



Later, another five observations (units 4, 11, 15, 17 and 20, all from different treat- 
ments) were identified as having been subject to contamination and so the analysis was 
rerun with these observations also set missing. Two sequential ANOVA tables for the 
full model [1] + Prep*Conc are in Table 11.18. 

Both ANOVA tables have the same (correct) estimate of background variation derived 
from the ResMS. But the inflated interaction sum of squares in the approximate (Healy- 
Westmacott) ANOVA table suggest that there is strong evidence of an interaction, 

TABLE 11.18 

Sequential ANOVA Table for the ELISA Calibration Study with Six Missing Observations Using 

Either the Healy-Westmacott Algorithm or Ignoring the Missing Responses (Example 11.3) 



Healy-Westmacott Ignoring Missing Responses 



Source 


df 


SS 


MS 


VR 


P 


SS 


MS 


VR 


P 


+ Prep 


2 


0.290 


0.145 


11.69 


0.009 


0.112 


0.056 


4.52 


0.064 


+ Cone 


3 


23.465 


7.822 


630.83 


< 0.001 


16.818 


5.606 


452.13 


< 0.001 


+ Prep.Conc 


6 


0.568 


0.095 


7.64 


0.013 


0.367 


0.061 


4.94 


0.037 


Residual 


6 


0.074 


0.012 






0.074 


0.012 






Total 


17 


17.371 








17.371 









Note: df = incremental df, SS = incremental sum of squares, MS = mean square, VR = variance ratio, P = observed 
significance level. 
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Relative concentration Relative concentration 

FIGURE 11.3 

Predicted absorbance (a) on log scale and (b) back-transformed for four relative concentrations and three meth- 
ods of preparation (o 1, • 2. • 3) (Example 11.3). 

whereas the exact ANOVA table (ignoring the missing responses) has a larger observed 
significance level, although still statistically significant at the 5% level. This demonstrates 
the possible inflation of significance under fhe Healy-Westmacott algorithm as the num- 
ber of missing values increases. Again, as the Healy-Westmacott analysis preserves the 
orthogonal structure, its ANOVA table is invariant to the order of fitting. This is not the 
case when the missing values are omitted, and the sequential ANOVA table obtained 
by fitting factor Cone first is now quite different (not shown) although the same sum of 
squares and observed significance level is obtained for the interaction term. 

The fitted model (with the six disputed observations omitted) is shown on the log 
scale and back-transformed in Figure 11.3. The prediction SEs on the log scale are 0.079 
for treafment combinations with two observations and 0.111 for treatment combinations 
with one observation, and the SEDs range from 0.111 to 0.157. The absorbances decrease 
across the concentrations, but it appears that preparation methods 1 and 3 show no dif- 
ference in response between the first two concentrations, whereas method 2 shows a 
consistent decrease in absorbance across the full set. 



11.5 Incorporating the Effects of Unplanned Factors 

Another frequent cause of non-orthogonality is the occurrence of unplanned events 
within a designed experiment or observational study. For example, if pigeons graze a 
field experiment unevenly, then we can classify plots as wholly, partially or not grazed to 
quantify the damage done and then incorporate this new factor into the model to account 
for the effect of grazing on yield. Because the presence of such variables is (by definition) 
unplanned, they usually result in a non-orthogonal structure. For example, pigeons are 
unlikely to distribute their grazing evenly across treatments, and so the grazing factor 
is likely to be non-orthogonal to the treatment factor(s). We use the term extraneous for 
variables that are unrelated to the original design. In this section, we consider only extra- 
neous variables that are qualitative (i.e. factors); quantitative extraneous variables (i.e. 
variates) are dealt with by a technique called analysis of covariance, which is described 
in Section 15.5. 
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Usually, the aim of analysis is to investigate whether treatments differ after the extrane- 
ous factors have been taken into account, and hence these factors should be added into the 
explanatory component before treatment terms. This leads to adjusted estimates of treat- 
ment effects, with the extent of the adjustment depending on the degree of non-orthog- 
onality between the treatment and extraneous factors. The adjusted treatment effects 
should be more robust after the extraneous factors have been eliminated (corrected for), 
and including the extraneous factors usually reduces the estimate of background varia- 
tion, and leads to more precise estimates and more sensitive tests. 

It can also be useful to investigate whether there is any evidence of an interaction 
between the extraneous and treatment factors, but there may be practical difficulties in 
doing so. On the one hand, if only a small proportion of the full set of treatment x extrane- 
ous factor combinations is observed, evaluation of their interaction is based on a small 
amount of information and may be unreliable. On the other, if all of the combinations are 
present but unreplicated then fitting this interaction is uninformative (as each combina- 
tion will be fitted exactly) and may leave insufficient ResDF for a reliable test. However, it 
is important to understand that omitting this interaction requires an implicit assumption 
that the interaction is zero. 

EXAMPLE 11.4: PLANT HEIGHTS IN GLASSHOUSE* 

A glasshouse experiment was done to investigate the effect of the dose of a growth regu- 
lator on plant height under controlled conditions. Six increasing doses (factor Dose) were 
each applied to four replicate plants in separate pots that were arranged according to a 
CRD in a grid layout consisting of four rows (factor Row) and six columns (factor Column) 
on a bench. Plant heights (cm, variate Height) were measured six weeks later. The data are 
listed in the experimental layout in Table 11.19 and are held in file heights.dat. 

Preliminary analysis of the plant heights, using explanatory component [1] + Dose, 
revealed a trend in the plot of standardized residuals against fitted values (see Figure 
11.4a). Further investigation showed that this was due to a strong pattern of increasing 
residual value across columns (see Figure 11.4b). 

The glasshouse manager suggested that this pattern was real, as it could be explained 
by differential shading on one end of the bench. Columns of the design were therefore 
incorporated as an extraneous factor (called Column) in the analysis, with explanatory 
component [1] + Column + Dose. Table 11.20 shows the ANOVA tables for the mod- 
els excluding and including this extraneous factor. There was strong evidence of dif- 
ferences between doses from the original model that ignored columns (Fjjg = 4.03). 
However, the variance ratio for the term Dose increased greatly once the effect of col- 
umns was eliminated (Fj jg = 18.83), as the shading effect was partly masking treatment 
differences. The predicted means for each dose before and after correction for column 
effects are shown with 95% CIs in Figure 11.5. When the column effects are ignored 

TABLE 11.19 



Layout and Heights (cm) for Plants in Pots in a Glasshouse Experiment (Example 11.4 and File 
heights.dat) 











Column of Layout 






1 


2 


3 


4 


5 


6 


Row of 


1 


(1) 58.4 


(3) 56.8 


(4) 61.8 


(4) 68.2 


(5) 61.6 


(3) 70.3 


Layout 


2 


(3) 55.5 


(4) 57.2 


(6) 50.8 


(6) 57.4 


(1) 70.0 


(5) 61.1 




3 


(6) 49.9 


(2) 60.9 


(6) 50.7 


(1) 71.3 


(5) 61.3 


(4) 70.0 




4 


(2) 56.7 


(3) 54.5 


(2) 61.3 


(2) 66.6 


(5) 65.8 


(1) 69.8 



Note: The dose level (1-6) applied to each plant is given within parentheses. 
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Column 



FIGURE 11.4 

Plots of standardized residuals against (a) fitted values and (b) column positions using explanatory component 
[1] + Dose (Example 11.4). 



TABLE 11.20 

ANOVA Table for Plant Heights from a Glasshouse Experiment with Treatment Effects (Eactor 
Dose) Eitted Either Ignoring or Eliminating the Extraneous Factor Column (Example 11.4) 



Source 




Ignoring Columns 






Eliminating Columns 




df 


SS 


MS 


VR 


P 


df 


SS 


MS 


VR 


P 


+ Column 


— 


— 


— 


— 


— 


5 


618.9 


123.8 


33.49 


< 0.001 


+ Dose 


5 


536.1 


107.2 


4.03 


0.012 


5 


348.1 


69.6 


18.83 


< 0.001 


Residual 


18 


478.8 


26.6 






13 


48.1 


3.7 






Total 


23 


1015.0 








23 


1015.0 









Note: SS = sum of squares, MS = mean square, VR = variance ratio, P = observed significance level. 



Ignoring column 




2 3 4 5 

Relative dose 



(b) Eliminating column 




FIGURE 11.5 

Predicted plant heights (cm) with 95% CIs (a) ignoring column (SED = 3.65, 18 ResDF) and (b) eliminating col- 
umn effects using explanatory component [1] + Column + Dose (min SED = 1.14, max SED = 2.10, 13 ResDE) 
(Example 11.4). 
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(Figure 11.5a), it appears that only dose 6 restricts height. After adjusting for column 
effects (Figure 11.5b), prediction SEs (and hence CIs) are much smaller, and both doses 5 
and 6 restrict height. The reduction in the estimate for dose 5 is large because this treat- 
ment occurred only in columns 5 and 6 on the bench, which had the largest responses. 
Clearly, any future experiment using this bench should use column as a blocking factor 
so that the shading can be accommodated in a more efficient manner and, ideally, as 
orthogonal to treatments. 



11.6 Analysis Approaches for Non-Orthogonal Designs 

Statistical software often includes several algorithms for the analysis of linear models. For 
example, most packages include algorithms to produce multi-stratum ANOVA or to fit a 
linear regression (Chapters 12 to 15), generalized linear model (GLM; Chapter 18) or linear 
mixed model (LMM; Chapter 16). Different algorithms present estimates in different for- 
mats, possibly using different parameter izations. But it is important to understand that, 
when they can specify the same model, two different algorithms using the same method 
will produce equivalent results. Alternative algorithms are typically provided in statistical 
software because different approaches can be used to take advantage of special cases: the 
most general algorithms can be inefficient for simpler models. 

The simplest case occurs when a study consists of a set of unstructured units, so that the 
structural component of the model is not required. Any general algorithm for analysis of 
linear models should be able to process a study of this form. This is not the case when both 
explanatory and structural components are present, which ideally requires an algorithm 
that recognizes the separate roles of the two components of the model. 

The provision of commands to produce a multi-stratum ANOVA is probably the most 
important requirement of software to analyse designed experiments with blocking or 
other structure. Unfortunately, the multi-stratum ANOVA can be defined only for orthog- 
onal block structures, and where the treatment structure satisfies certain conditions of 
balance. In general terms, a block structure is orthogonal if all units at a given level of 
the hierarchy each contain the same number of units from a lower level. For example, in a 
RCBD, all blocks contain the same number of plots; hence, this is an orthogonal blocking 
structure. Similarly, the BIBD, LS and SP designs described in Chapter 9 all have orthogo- 
nal blocking structures (although the block and treatment structures are non-orthogonal 
for the BIBD, see Section 9.3). 

For non-orthogonal block structures or unbalanced designs with a structural compo- 
nent, a more general procedure is required. Algorithms for linear mixed models (LMMs) 
can be used for this purpose, and these methods are introduced in Chapter 16. If facili- 
ties for LMMs are not available, then the explanatory and structural components must 
be combined into a single model. If most treatment comparisons are made within blocks, 
then it is often possible to get a good analysis by specification of the model terms in an 
order that mimics the multi-stratum analysis. Flowever, this approach requires a good 
understanding of the experimental structure and loses information on treatment com- 
parisons made between blocks; this strategy is known as the intra-block analysis. In the 
remainder of this section, we focus on strategies to produce an intra-block analysis that 
gives a reasonable approximation to the full analysis using both the explanatory and 
structural components. 
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11.6.1 A Simple Approach: The Intra-Block Analysis 

In the simplest case, in which treatment effects are estimated within the lowest stratum, an 
intra-block analysis can be obtained by specification of blocking or ofher sf rucfural facfors 
(and any exfraneous facfors) before freafmenf terms in the explanatory component. The 
structural term corresponding to the deviations (lowest stratum) should always be omit- 
ted in this context. For example, in a BIBD, the intra-block analysis can be obtained from 

Explanatory component: [1] -i- Blocks -i- Treatments 

The fitted model provides estimates of freafmenf effecfs adjusfed for (eliminafing) block 
effecfs. When a freafmenf term has deliberately been completely confounded with block- 
ing or structure at some level, that treatment term should be fitted before the confounded 
strucfural ferm. For example, for fhe SP design presented in Example 9.2, the irrigation 
effects are confounded wifh whole plots within blocks, and so the combined explanatory 
component would be written as 

Explanatory component: [1] -i- Block -i- Irrigation -i- Block.WholePlot -i- Species 

-I- Irrigation. Species 

This results in the same mean squares as in the multi-stratum ANOVA, but some care is 
required to obtain the correct variance ratios. The Irrigation mean square must be divided 
by the Block.WholePlot mean square, but the Species and Irrigation. Species mean 
squares must be divided by the ResMS. Following these principles, one can reconstruct the 
multi-stratum ANOVA table. Unfortunately, the confounding befween fhe Irrigation and 
Block.WholePlot terms induces dependencies that make the parameterization, and hence 
the estimated Irrigation effects, difficult to interpret. For the same reason, it is difficult to 
accommodate pseudo-replication within the explanatory component (although this can be 
avoided by analysis of means of the pseudo-replicates in cases of equal replication). 

In general, the intra-block analysis with a single model formula is sensible only when 
most of the treatment differences are estimated at the lowest level of the structure. In other 
cases, use of LMMs (see Chapter 16) is preferable. An example of intra-block analysis for a 
design with non-orthogonal blocks and treatments is presented below. 



EXAMPLE 11.5: EEEECT OE TYPE AND SIZE OE CUTTING ON WILLOW YIELD* 

A field experiment was designed to investigate whether the type of cutting planted 
affects the subsequent growth of willows. Cuttings of five different types (A-E, factor 
Type) were to be planted, and growth parameters would be measured over the follow- 
ing seasons, including yield at the end of the first year. At planting time, it was realized 
that the cuttings to be planted varied greatly in size, and that this might also have an 
effect on subsequent growth. Two options were considered here. Cutting size could 
be confounded with blocks, so that each block contained cuttings of the one size only. 
Alternatively, cutting size could be investigated as an extraneous factor, in addition to 
type. The second option was taken, and cuttings were classified as small (S), medium 
(M) or large (L, factor Size). Not all of the type x size combinations were available, and 
the total number of plots was fixed at 25. The design was based on a five-block RCBD 
with respect to cutting type, and the different sizes were allocated in as balanced a way 
as possible across blocks (factor Block) and cutting types. The yield (variate Yield) with 
allocation of size and type combinations to the five blocks is shown in Table 11.21 and 
stored in file cuttings.dat. 
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TABLE 11.21 

Yield after First Year for Willows Grown from Cutfings of Different Types (A-E) and Size (S, M, L) 
in Five Randomized Blocks of Five Plots (Example 11.5 and File cuttings.dat) 



Block 


Plot 


Type 


Size 


Yield 


Block 


Plot 


Type 


Size 


Yield 


1 


1 


D 


M 


15.16 


4 


1 


E 


M 


23.84 


1 


2 


E 


L 


18.31 


4 


2 


D 


L 


21.30 


1 


3 


A 


M 


23.94 


4 


3 


B 


M 


24.51 


1 


4 


B 


S 


24.37 


4 


4 


A 


S 


25.77 


1 


5 


C 


L 


12.04 


4 


5 


C 


M 


18.34 


2 


1 


D 


L 


15.30 


5 


1 


A 


S 


23.01 


2 


2 


A 


L 


19.81 


5 


2 


C 


L 


14.74 


2 


3 


E 


M 


16.45 


5 


3 


E 


M 


21.67 


2 


4 


B 


S 


18.45 


5 


4 


D 


S 


17.30 


2 


5 


C 


M 


17.28 


5 


5 


B 


M 


16.63 


3 


1 


B 


M 


24.56 












3 


2 


A 


M 


24.60 












3 


3 


C 


S 


25.11 












3 


4 


D 


M 


22.90 












3 


5 


E 


L 


25.71 













Each cutting type appears once in each block, and each size appears at least once in 
each block, although the size x type combinations are unequally replicated, and two 
combinations (S x E and L x B) were not available (Table 11.22). 

Since the design is unbalanced, a multi-stratum ANOVA cannot be formed. However, 
the treatment information for the cutting type main effect could be retrieved from 
within-block comparisons (as all types occur once in each block) and so an approximate 
analysis should be acceptable here, as this is the main focus of interest. Block effects 
(factor Block) must be added into the model first, followed by the extraneous factor Size, 
so that cutting size can be eliminated before comparing cutting types (factor Type). As 
a preliminary model, we include the interaction between cutting size and type (term 
Size.Type) to give 

Explanatory component: [1] + Block + Size + Type + Size.Type 

In the resulting ANOVA table (Table 11.23), there is strong evidence of differences in 
yield between blocks (Ff g = 7.385, P = 0.009), and a suggestion of a difference befween 



TABLE 11.22 



Occurrence of Cutting Type x Size Combinations 
in the Willow Yield Trial (Example 11.5) 



Cutting 

Size 






Cutting Type 






A 


B 


C 


D 


E 


Total 


S 


2 


2 


1 


1 


0 


6 


M 


2 


3 


2 


2 


3 


12 


L 


1 


0 


2 


2 


2 


7 


Total 


5 


5 


5 


5 


5 


25 
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TABLE 11.23 

Sequential ANOVA Table for the Willow Yield Trial (Example 11.5) 



Change 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


+ Block 


4 


186.4298 


46.6074 


pB = 7.385 


0.009 


+ Size 


2 


49.0181 


24.5090 


F® = 3.883 


0.066 


+ Type 


4 


96.9158 


24.2290 


pT = 3.839 


0.050 


+ Size.Type 


6 


21.0791 


3.5132 


F®T = 0.557 


0.754 


Residual 


8 


50.4896 


6.3112 






Total 


24 


403.9324 









sizes (F®8 = 3.883, P = 0.066), although this term was partially confounded with blocks 
and so may be masked by block differences. There was some evidence of a difference 
befween cutting types (F 4 ,s = 3.839, P = 0.050). There was no evidence of an interacfion 
befween cutting size and type (F6®s^ = 0.557, P = 0.754) and so this term was dropped 
and the model refitted, with all other terms being retained. 

Predictions of yield for different cutting sizes can be calculated from the table of fit- 
ted values obtained from explanatory component [1] + Block + Size + Type. This gives 
a three-way table classified by Block, Size and Type, and marginal means can be taken 
for cutfing type to give the predictions shown in Figure 11.6a. 

Cutting types C and D produced the least yield, with larger yields produced by 
types A and E, and type B intermediate. For comparison. Figure 11.6b shows the pre- 
dicted means obtained if fhe effecf of cutfing size is ignored, i.e. from explanatory 
component [1] + Block + Type. The two sets of predicfions are very similar, partly 
because cutting sizes are reasonably balanced across the different types (so the anal- 
ysis is close to orthogonal), and partly because the effects of cutfing size are quite 
small. Predictions for cuffing size are listed in Table 11.24, and it appears that first 
year yield is somewhat greater for smaller cutfings, with medium and large cuttings 
producing similar yield. 



(a) Eliminating cutting size 

28 1 

26 - 
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Cutting type 



(b) Ignoring cutting size 

28 -| 
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Cutting type 



FIGURE 11.6 

Predicted yields with 95% CIs (a) eliminating cutting size using explanatory component [1] + Block + Size + Type 
(min SED = 1.449, max SED = 1.538, 14 ResDF) and (b) ignoring cutting size (SED = 1.565, 16 ResDF) (Example 11.5). 
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TABLE 11.24 



Predicted Yield with SE for Cutting Sizes from Explanatory 
Component [1] + Block + Size + Type (Example 11.5) 







Size 






Small 


Medium 


Large 


Prediction 


22.394 


20.109 


19.347 


SE 


0.9889 


0.6667 


0.9190 



Note: Maximum LSD (5% significance level) = 3.0335. 



EXERCISES 

11.1* Thirty-two moths were assigned at random to separate flight mills, and the 
distance (m) flown by each moth during one night was measured electroni- 
cally. The species (A, B or C) and sex (F, M) of each moth was recorded. File 
FLiGHT.DAT Contains the mill number (Mill), with the species, sex and distance 
flown (factors Sex, Species, variate Distance) for the moth in each mill. The 
aim of the statistical analysis is to investigate whether there are any consis- 
tent differences among species or between sexes, and whether any difference 
between sexes is consistent across species. 

a. Write down an explanatory model for this experiment in terms of the two 
explanatory factors (Sex and Species). Consider the replication of each of 
the factor combinations and decide whether this structure is orthogonal. 

b. Fit your model with the logip-transformed distances as your response vari- 
able. Use two different orders for adding terms into the model and explain 
the differences in the corresponding sequential ANOVA tables. 

c. Identify the best predictive model for these data. Interpret your model and 
produce predictions with SE for the distance flown overnight by each sex 
and species of moth. 

11.2 Weeds within a crop can greatly decrease yield and there is interest in the 
impact of different weed species, both alone and in combination. A RCBD with 
two blocks of 30 plots was set up to investigate the effect of different densities 
of barley and chickweed on the yield of a linseed crop. There were 29 treat- 
ments in total: a factorial combination of five densities of barley with five den- 
sities of chickweed (25 treatments), with duplicates of the control (no weeds), 
plus two higher densities of each of the individual species (four treatments). 
File DENSiTY.DAT holds the unit numbers (ID), structural factors (Block, Plot), 
the applied seed rate of barley and chickweed (variates B, C) and the resulting 
grain yield (variate Grain).' 

Create factor versions of the weed density variates. Write down an explana- 
tory model in terms of these factors, identify the structural component of the 
model for this trial, and fit the model using both components (the intra-block 
analysis). Is there any evidence of an interaction between the weed species? 
Identify the predictive model and write down its form. Produce predictions 
with SE for each combination of weed seed densities present in the trial. (We 
re-visit these data in Exercise 17.9.) 



Data from P. Lutman, Rothamsted Research. 
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11.3 A glasshouse trial was set up to investigate the profit associated with differ- 
ent methods of growing peppers (Mead ef al., 2012, Chapfer 7). The experimenf 
confinued over fwo years, and fwo blocks of six glasshouse comparfmenfs were 
used in each year (24 comparfmenfs in fofal), wifh one sef of condifions applied 
fo each comparfmenf. The eighf freafmenfs were all combinafions of sfandard 
(0) or enhanced (1) levels of heaf, lighf and CO 2 . All freafmenfs were fesfed in fhe 
firsf year (four fesfed fwice, four fesfed once) and fhe besf five were fesfed again 
in fhe second year (wifh one freafmenf repeafed in each block). The unif num- 
bers (ID), sfrucfural factors (Year, Block, DComp) and freafmenf factors (Heat, 
Light, CO 2 ) are held in file peppers.dat, wifh a measure of profif (yield remaining 
affer accounfing for cosfs, variafe Profit). 

Analyse fhe profif, faking accounf of fhe experimenfal sfrucfure (years and 
blocks), as well as fhe fhree freafmenf facfors. Idenfify a sensible predicfive 
model and suggesf which combinafion of fhe fhree facfors should be used in 
pracfice fo maximize profif. (We re-visif fhese dafa in Exercise 16.3.) 

11.4 In Exercises 6.3 and 8.5 you analysed fhe score of pofafo scab from a CRD. 
Repeaf your analysis and plof fhe residuals in field layouf (as defined by fhe 
facfors Row and Col provided in file scab.dat). Idenfify any clear spafial frend 
and add suifable exfraneous facfors info fhe model fo accounf for fhis. Compare 
your new model wifh fhe original, and commenf on whefher fhe increased 
complexify is jusfified. 

11.5* A NIRS machine was used fo measure fhe profein confenf of 35 accessions of 
wheaf. Sefs of six samples were analysed fogefher in each run of fhe machine 
and fhe measuremenfs were made in seven pairs of replicafe runs. Each pair of 
runs used subsamples of seed from fhe same five accessions wifh a sfandard 
confrol sample (used in all runs). Eile nirs.dat holds fhe unif numbers (ID), 
sfrucfural facfors (Pair, Rep), informafion on lines (facfors Type, Accession) 
and profein measuremenfs (variafe Protein). Wrife down fhe explanatory and 
sfrucfural componenfs of fhe model for fhis frial, and fif fhe model using bofh 
componenfs (fhe infra-block analysis). Is fhere evidence of variafion in pro- 
fein confenf befween fhe 35 accessions? Whaf are fhe issues wifh fhe design of 
fhis experimenf? Can you suggesf a beffer design? (We re-visif fhese dafa in 
Exercise 16.3.) 

11.6 Eive pruning freafmenfs were fesfed on apple frees (Pearce, 1965, Secfion 6.2). A 
balanced incomplefe block design was used fo allocafe fhe five freafmenfs (a-e) 
fo four branches on each of 15 frees (60 branches in fofal). One of fhe oufcomes 
measured was fhe lengfh of shoofs from fhe middle fhird of each branch, buf 
fhis was only measured for freafmenfs a, b and d. The shoof lengfhs (vari- 
afe Length) are in file shoot.dat wifh fhe unif numbers (ID), sfrucfural facfors 
(Tree, Branch) and freafmenf facfor (Treatment). Analyse these data, account- 
ing for possible differences between trees as well as treatments. Can you iden- 
tify which treatment produces the longest shoots? (We re-visit these data in 
Exercise 16.3.) 

11.7 An experiment using a Latin square design was intended to compare the 
yield of six varieties of turnip, but three plots were damaged by vandals 
before harvest and their yield could not be obtained (Hand et al., 1994, Data 
Set 78). The yield (variate FreshWt, fresh weight in pounds per plot), plot num- 
bers (Plot) and the design (Row, Column) and treatment (Variety) factors are 
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held in file vandal.dat. Write down a model for this trial that recognizes the 
structure of the Latin square design. Compare analysis of the yield by multi- 
stratum ANOVA with Healy-Westmacott estimation of missing values to an 
intra-block analysis excluding the missing plots. Do the two methods give the 
same conclusions in terms of variety comparisons? (We re-visit these data in 
Exercise 16.3.) 



12 



Models for a Single Variate: Simple Linear 
Regression 



In Chapter 1, we described experimental and observational studies as scientific enquiries 
in which an outcome (or response) is investigated with the objective of undersfanding 
how if is affecfed by fhe experimenfal condifions. In fhis confexf, sfafisfical models are 
used fo quanfify relafionships befween fhe response variable and one or more explana- 
tory variable(s) fhaf define fhe condifions. Two simple examples were illusfrafed in Secfion 
1.3, where fhe single explanatory variable corresponded fo eifher a qualifafive variable (or 
facfor. Example 1.1) or a quanfifafive variable (or variafe. Example 1.2). In Chapfer 4, we 
presenfed defails of fhe analysis for dafa classified by a single explanafory facfor, includ- 
ing fhe form of fhe underlying model, paramefer esfimafion and sfafisfical inference. This 
was mainly placed in fhe confexf of designed experimenfs. We now focus on fhe analysis 
of dafa where fhe single explanafory variable is quanfifafive, or a variafe. This is usually 
known as regression analysis. However, fhe sifuafion wifh eifher a qualifafive or a quan- 
fifafive explanafory variable resulfs in fhe same basic form of linear model (Secfion 1.4). 
Bofh consisf of a sysfemafic componenf and a random componenf, wifh analysis based on 
fhe same underlying sfafisfical fheory fo esfimafe parameters and predicf from fhe fiffed 
model. In fhis chapfer, we are concerned only wifh models including a single explana- 
fory variafe. More broadly, regression analysis refers fo fhe more general approach wifh 
any number of quanfifafive (and qualifafive) explanafory variables. In Chapfer 14, we 
shall exfend fhe model fo incorporafe fwo or more explanafory variafes (mulfiple regres- 
sion), and in Chapfer 15, we consider models fhaf also include qualifafive explanafory 
variables (factors), somefimes called regression wifh groups. In Chapfer 16, we use linear 
mixed models fo fake accounf of fhe sfrucfure in fhe observafions, including blocking 
and pseudo-replicafion. 

Here, we begin by presenfing fhe simple linear regression (SLR) model (Secfion 12.1) 
followed by paramefer esfimafion (Secfion 12.2). ANOVA assesses whefher fhe variafion 
in fhe response fhaf is associated wifh fhe explanafory variafe is large in comparison fo 
fhe background variafion (Secfion 12.3), and also provides an esfimafe of fhis background 
variafion used for inference on fhe model paramefers (Secfion 12.4). A primary purpose of 
regression analysis is predicfion of fhe response af given values of fhe explanafory vari- 
afe (Secfion 12.5). Goodness-of-fif sfafisfics can be used fo assess fhe qualify of fhe fiffed 
model or fo compare if againsf ofher pofenfial models (Secfion 12.6). Uncerfainfy in fhe 
explanafory variable changes fhe inferprefafion of fhe model, and we discuss fhis issue in 
Secfion 12.7. We fhen show how fo use replicafion fo formally evaluate fhe fif of fhe model 
(Secfion 12.8). Einally, we describe fwo simple variafions on fhe basic SLR model - fhe use 
of sfandardized explanafory variafes and regression fhrough fhe origin - and explain fhe 
process of inverse predicfion (Secfion 12.9). 
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12.1 Defining the Model 

The simplest model that can be used to describe the relationship between a response vari- 
able and a quantitative explanatory variable (or variate) takes the form of a sfraighf line 
passing fhrough fhe scatter of poinfs arising when values of fhe response variable are 
plotted againsf fhe corresponding values of the explanatory variate. This type of model is 
known as a simple linear regression (SLR) and if can be represenfed mafhemafically as 

y, = a -I- P Xi + Ci , (12.1) 

where y, and x, are fhe values of fhe response and fhe explanatory variafes, respecfively, 
for fhe ifh observafion. The quantify e, represents a random deviation for fhe ifh observa- 
fion, and fhe subscripf i ranges from 1 to N, where N is fhe fofal number of observafions. 
The model is a sfraight line defined in ferms of fhe model parameters a and P, as shown in 
Figure 12.1. Parameter a (offen called fhe infercepf or consfant parameter) corresponds fo 
fhe poinf af which fhe line infercepfs fhe y-axis, and is fhe value of fhe sfraighf line when 
fhe explanatory variafe is equal fo 0. Parameter P, fhe coefficienf of fhe explanatory variate, 
is fhe slope (gradienf) of fhe line, i.e. fhe change in fhe response produced by a unif change 
in fhe explanatory variafe. The SLR is called simple because it contains a single explanatory 
variable and linear because the response is expressed in a linear form, i.e. as a sum of ferms 
fhat each consisfs of a coefficienf mulfiplied by an explanatory variable. 

The deviafion e, represenfs fhe random sfochasfic or probabilistic element of fhe model, 
sometimes called the random noise or residual error. It can be visualized as the verti- 
cal displacement of fhe ifh observation from fhe line (see Figure 12.1). In Secfion 4.1, we 
described fhe deviafions as represenfing background variafion abouf group means. In fhe 
confexf of regression analysis, background variafion reflecfs fhe discrepancy in response 
befween measuremenfs from fwo unifs wifh fhe same value of fhe explanatory variafe. 
Flowever, here, we are fiffing a sfrucfured model, in fhe form of a sfraighf line, and if fhe 
paffern in fhe observations does not match the form of fhe model, fhen fhe deviafions also 




FIGURE 12.1 

Representation of a SLR model, showing the line ( — ), with one observation (•) and its predicted value (o). 
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encompass the systematic difference befween fhaf paffern and fhe sfraighf line. This is 
undesirable, as fhe model fhen does nof capfure fhe frend and fhe deviafions no longer 
represenf purely random variafion. Therefore, you should always view fhe dafa before 
fiffing fhe model, for example by using a scaffer plof, fo check fhaf fhe underlying relafion- 
ship could plausibly be a sfraighf line. We consider fhe implicafions of fhis below and give 
mefhods for graphically examining fhe model fif in Chapfer 13. 

For fhe SLR model in Equafion 12.1, fhe sysfemafic componenf of fhe model is fhe sfraighf 
line, a + |3x„ and fhe random componenf is fhe deviafion, e,. Recall fhaf in Chapfer 4, we 
infroduced a symbolic nofafion fo represenf linear models. This nofafion specifies fhe 
response variable and fhe sysfemafic componenf of fhe model in ferms of fhe explanafory 
variables. This reflecfs fhe form of model specificafion required for sfafisfical soffware, 
alfhough fhe defails vary according fo fhe package used. For SLR, fhe sfraighf line rela- 
fionship is fhe explanafory componenf of fhe model, and we wrife fhe model in symbolic 
form as 

Response variable: y 

Explanafory componenf: [1] +x 

where fhe variafe y confains fhe observed responses. The ferm [1] denofes a variafe fhaf 
fakes value 1 everywhere, and is associafed wifh fhe infercepf paramefer, a. The variafe 
X confains fhe values of fhe explanafory variable and is associafed wifh fhe slope param- 
efer p. 

In fhe confexf of designed experimenfs, we parfifioned fhe sysfemafic componenf info 
explanafory ferms (associafed wifh freafmenfs applied) and sfrucfural ferms (associafed 
wifh fhe experimenfal sfrucfure, such as blocking or pseudo-replicafion). For regression 
analysis, fhe same parfifion applies, alfhough fhe sfrucfural componenf can be omiffed 
when no sfrucfure is presenf, as may be fhe case for observafional sfudies. However, if fhe 
unifs are sfrucfured, fhen if is imporfanf fhaf fhe sfrucfure is incorporafed info fhe model. 
This exfension of fhe model is nof usually provided wifhin soffware designed for regres- 
sion analysis and we discuss fhis furfher in Secfion 15.3 and Chapfer 16. 



EXAMPLE 12.1A: DIPLOID WHEAT 

Several morphological traits were measured for 190 seeds selected at random from a 
line of diploid wheat, Triticum monococcum, with the aim of identifying variables asso- 
ciated with differences in seed weight (Jing et al., 2007; Wheat Genetic Improvement 
Network (WGIN): www.wgin.org.uk). The variables measured were weight (mg), diam- 
eter (mm), length (mm), moisture content (%) and endosperm hardness (single-kernel 
characterization system index value). The data are in file triticum.dat which contains 
a variate DSeed to identify each seed in addition to variates Weight, Diameter, Length, 
Moisture and Hardness. A subset of the data is in Table 12.1 and the full set is given in 
Table A.l. 

Seed size, as measured by length, is expected to be a major contributor to differences 
in seed weight, and so, we start by examining the relationship between seed weight and 
seed length, using the scatter plot presented in Figure 12.2. 

The relationship between the two variates appears approximately linear; so, it makes 
sense to fit a SLR that relates the weight of the ith seed. Weighty, to the length of the ;th 
seed, Lengthj, to investigate this relationship further. The statistical model is 



Weighti = a + ^Lengthi + e, , 



290 



Statistical Methods in Biology 



TABLE 12.1 



First Four and Last Four Observations of Seeds of Diploid Wheat from a Study to Identify 
Variables Associated with Variation in Seed Weight (Example 12.1A, Full Data in File 
TRiTicuM.DAT and Table A.l) 



Seed 


Weight 


Length 


Diameter 


Moisture 


Hardness 


1 


30.15 


3.27 


2.09 


10.27 


-16.63 


2 


35.51 


3.65 


2.34 


10.61 


-8.27 


3 


29.16 


3.36 


2.15 


10.27 


-21.45 


4 


16.82 


2.77 


1.79 


11.05 


4.13 


187 


27.66 


3.60 


2.31 


10.88 


-22.68 


188 


26.54 


3.58 


2.29 


10.49 


3.30 


189 


30.90 


3.17 


2.03 


10.37 


-17.83 


190 


18.94 


2.45 


1.62 


10.08 


-7.06 



Source: Data from H.-C. Jing and K. Hammond-Kosack, Rothamsted Research. 



where e, represents the deviation in the weight of the zth seed from the straight line 
relationship and i runs from 1 to 190 to represent the 190 observations. Parameter a rep- 
resents the intercept, the expected seed weight for zero seed length. From simple bio- 
logical arguments, we would expect the intercept to be zero, but this ignores aspects of 
the statistical modelling process that we discuss further in Example 12.1B. This model 
is written in symbolic form as 

Response variable: Weight 

Explanatory component: [1] + Length 

where the variate Weight contains the observed seed weights and variate Length con- 
tains the corresponding seed lengths. 




FIGURE 12.2 

Scatter plot of weight (mg) versus length (mm) for 190 diploid wheat seeds (Examples 12.1A and 12.1E). 
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To fit the SLR model, some assumptions must be made about the deviations, e,. These 
assumptions apply to any linear model, and so were presented in Section 4.1. Nevertheless, 
we repeat them here for completeness. 



Assumption 1 



E(e,) = 0 fori = l...N. 

The expected value (function E) of each deviation is assumed to be zero. This means that 
the population mean of the deviations is zero, which implies no systematic bias in the 
observations. ■ 



Assumption 2 



Var(e,) = for i = 1...N . 

The variances (Var) of the deviations are the same for all units. This is also known as 
homoscedasticity, or homogeneity of variances. ■ 



Assumption 3 

Cov(e,,e^) = 0 for all i ^ j, and i,j = I ... N . 

The covariance (Cov) between deviations for two separate observations is zero, i.e. the 
deviations are independent. ■ 

Assumption 4 

e, ~ Normal(0, a^) . 

The deviations follow a Normal distribution with mean 0 and variance a^. ■ 

In addition, we make an assumption on the explanatory variables: 



Assumption 5 

The values of the explanatory variables (factors or variates) are known without error. ■ 

Common violations of Assumptions 1 to 4 were described and discussed in Chapter 5. 
Here, we reiterate that Assumption 3 is most often violated when data are collected from 
the same source at different times, and that Assumption 4 is required to make any statisti- 
cal inferences valid that rely on the Normal distribution. Assumption 5, which states that 
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the explanatory variable is known or measured without error, is of particular importance 
for regression models. This assumpfion may be realisfic in experimenfs where levels of 
fhe explanafory variafe are confrolled by fhe experimenfer (e.g. fixed amounfs of nifrogen 
ferfilizer fo be applied fo field plofs). However, in observafional sfudies, values of explana- 
fory variafes are usually observed rafher fhan under fhe confrol of an experimenfer, and 
fhese observafions are offen prone fo error. This does nof invalidafe fhe analysis, buf if does 
change fhe inferprefafion of fhe fiffed model and is discussed furfher in Secfion 12.7. 

In Chapfer 5, we infroduced some diagnosfic fools fhaf can be used fo check fhe plau- 
sibilify of assumpfions made abouf fhe deviafions in models wifh a single qualifafive 
explanafory variable. As fhe same assumpfions abouf fhe deviafions apply here, fhe same 
fools can be used fo check fhe validify of models wifh a single quanfifafive explanafory 
variable (fhey are also appropriafe for more complex regression models). These fools are 
revisifed in Chapfer 13. These fools use fhe residuals obfained from fiffing fhe model, 
and for regression models, if is parficularly imporfanf fo use sfandardized residuals for 
fhis purpose (see Secfion 5.1.2). However, fhe assumpfion of a sfrucfured form for fhe 
response, here a sfraighf line, means fhaf fhe form of fhe model musf also be checked. We 
refer fo a mismafch befween fhe observed paffern and fiffed model as model misspecifi- 
cafion, and some addifional diagnosfic fools fo deal wifh such sifuafions are described in 
Chapfer 13. 



12.2 Estimating the Model Parameters 

In Section 4.2, we outlined the principle of least-squares estimation, the method that finds 
the best-fit model by minimizing the sum, across all observations, of the squares of the 
differences between the observed data and the fitted values. This principle can be used for 
any linear model, which includes the estimation of parameters for SLR. For the SLR model, 
the fitted values can be written as 



y,- = d -I- p X, . 

As stated earlier, the hats O over i/„ a and P specify that they are estimates of population 
values for which the true values are not known. The simple residuals (see Section 5.1.1), 
which are estimates of the deviations, e„ are computed as 

£/ = Vi - y, = y, - (d + px,) . 

The quantity minimized to obtain the parameter estimates is the sum of these squared 
residuals, which for the SLR model is 

N N N 

i=l !=1 i=l 

As in Section 4.2, when minimized, this quantity is known as the residual sum of squares, 
denoted as ResSS. Again, we do not present the mathematical details of the minimization 
process here, but the interested reader can find them in Section C.3. 
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We can write estimates of the model parameters, a and [3, in terms of sums of squares 
and cross-producfs of fhe response and explanafory variafes. The sum of squares for 
fhe response variable is fhe sum, over all observafions, of fhe squares of fhe differences 
befween fhe observed responses and fheir mean, wriffen in mafhemafical ferms as 

N 

sSy_v = ^(.y^ - yf ■ 

(=1 

The sum of squares is simply equal fo fhe unbiased sample variance mulfiplied by fhe 
degrees of freedom for fhis variance, N - 1 (see Secfion 2.1). We denofe fhis sum of squares 
as SSyy, where 'SS' denofes a sum of squares and fhe subscripf idenfifies fhe relevanf variafe. 

The sum of squares for fhe explanafory variafe fakes a similar form: fhe sum, over all 
observafions, of fhe squares of fhe differences befween fhe values of fhe explanafory vari- 
afe and fheir mean, i.e. 



N 

SS„ = - xf . 

1=1 

Again, fhe sum of squares for fhe explanafory variafe equals ifs unbiased sample variance 
mulfiplied by N - 1. 

Finally, fhe sum of cross-producfs befween fhe response and explanafory variafe is 
calculafed as fhe difference befween fhe observed response on each unif and fhe mean 
response, mulfiplied by fhe difference befween fhe value of fhe explanafory variafe on 
fhaf same unif and fhe mean of fhe explanafory variafe. These quanfifies are fhen summed 
over all observed unifs. This is mafhemafically wriffen as 

N 

SS-ry = - x)(yi - y) . 

i=l 



The symbol for fhe sum of cross-producfs, SS^^, refers fo fhe fwo variables used fo form if, 
and fhe sum of cross-producfs is equal fo fhe unbiased sample covariance befween fhe fwo 
variafes, s,.y (Secfion 2.5), mulfiplied by N - 1. Nofe fhaf fhe sum of cross-producfs befween 
a variafe and ifself is simply fhe sum of squares for fhaf variafe. 

The leasf-squares esfimafe of fhe unknown populafion slope paramefer, (3, is equal fo fhe 
sum of cross-producfs befween fhe response and explanafory variafe, divided by fhe sum 
of squares for fhe explanafory variafe, i.e. 



The esfimafe of fhe unknown populafion infercepf paramefer, a, can fhen be wriffen in 
ferms of fhe esfimafed slope and fhe sample means for fhe response and explanafory vari- 
afes, fhus 



d = y - (3x . 
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It follows directly that the best fitting line passes through both sample means, as when 
Xi = X, the fitted value is the mean response, y, = y. 



EXAMPLE 12.1B: DIPLOID WHEAT 

We can now calculate parameter estimates for the SLR model for the diploid wheat data, 
which describe the variation in seed weight as a function of the explanatory variable 
seed length. 

For these data, the sum of squares for the explanatory variate, seed length, takes 
the value = 19.2699, and the sum of cross-products between seed weight and seed 
length is SS^.^ = 330.9297. The mean length is x = 3.295 mm and the mean weight is 
y = 28.658 mg. The parameter estimates for the SLR model are therefore 



- _ SS,y _ 330.9297 
^ SS,, 19.2699 



17.173 , 



6c = y - Px = 28.658 - (17.173 x 3.295) = -27.931 . 



The fitted model can therefore be written as 



Weight. = -27.931 + 17.173Lengthi , 



where the hat over the variable name denotes the estimated fitted value. The units of 
the intercept and slope here are mg and mg/mm, respectively, and an increase of 1 mm 
in seed length is expected to produce an increase of 17.17 mg in seed weight. The inter- 
cept represents the estimated average weight for seeds of length zero (i.e. -27.93 mg). 
Biologically, this is a startling value for two reasons: we clearly cannot have negative 
seed weights, and we expect a seed with zero length to have zero weight. This means 
we need to check that the model is appropriate for the data, but it does not necessarily 
mean that the model is inappropriate. The fitted model represents the best fitting line 
over the range of observed seed lengths (in this case from 2.45 to 4.13 mm). If this line 
is not representative of the unseen relationship over the range from 0 to 2.45 mm, then 
it is possible for the predicted value of zero length to be inaccurate even if the model 
is a good representation of the observed data. In this example, a length of 0 mm cor- 
responds to an extrapolation far outside the range of the observed data. We discuss the 
distinction between interpolation and extrapolation in Section 12.5. 

The fitted model is shown in Figure 12.3 together with the observed data (and a 95% 
confidence interval [Cl] for the fitted line, explained in Section 12.5). 

You should always inspect the behaviour of the fitted model, particularly for more 
extreme values of the explanatory variate. Figure 12.3 suggests some curvature in the 
relationship as all observations are above the fitted line for the shortest and longest 
seeds (length < 2.6 or > 4.8 mm). A composite set of residual plots for this model, based 
on standardized residuals, is presented in Figure 12.4. The histogram and the Normal 
plot do not indicate strong departures from a Normal distribution for the residuals 
(see Section 5.2.3), and the absolute residual plot shows no evidence of variance het- 
erogeneity. The fitted value plot suggests some trend in the residuals, with more nega- 
tive residuals for intermediate weights and largely positive residuals for the lightest 
and heaviest seeds. This gives further evidence of some curvature in the relationship, 
although a straight line appears to be a reasonable overall approximation. We discuss 
the use of residual plots to investigate the fit of this model in more detail in Chapter 13. 
However, for now, we consider that the data are reasonably consistent with the assump- 
tions underlying the linear model. 
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FIGURE 12.3 

Scatter plot of weight versus length for 190 diploid wheat seeds together with the fitted straight line ( — ) and 
95% CIs ( — ) for the expected mean response (Example 12.1B). 




Fitted value 




Fitted value 





Standardized residual Normal quantile 



FIGURE 12.4 

A composite set of residual plots after fitting a SLR model to the diploid wheat data (Example 12.1B). 
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12.3 Assessing the Importance of the Model 

As for models containing qualitative explanatory variables (or factors), ANOVA can be 
used to assess how well the single explanatory variate model describes the variation in the 
observed response. In particular, ANOVA assesses whether the variation in the response 
that is explained by the explanatory variate is larger than the background variation. In 
more general linear regression models (Chapters 14 and 15), ANOVA can be used to fur- 
ther assess the relative importance of several quantitative or qualitative explanatory vari- 
ables, or both, in explaining variation in a response variable. 

In the case of SLR (and like the single-factor model of Chapter 4), we use ANOVA to 
partition the total variation of the response into the portion explained by the systematic 
component of the model (here, a straight line depending on the explanatory variate) and 
the residual, or unexplained, portion attributed to the random component of the model. As 
described in Section 4.3, ANOVA quantifies variation in terms of sums of squares. Hence, 
the total variation (TotSS) is partitioned into the variation due to the model, i.e. the regres- 
sion line (ModSS, the model sum of squares) and the residual variation (ResSS, the residual 
sum of squares) so that 



TotSS = ModSS -r ResSS . (12.2) 

As with the single-factor model (Section 4.3), we calculate the total sum of squares by tak- 
ing the difference between each observed value and the overall mean response, squaring 
these differences and then adding them together. This is exactly the form of the sum of 
squares for the response, introduced earlier, so 

N 

TotSS = ^(y, - yf = SS„ . 

i=l 

The model sum of squares represents the variation in the response accounted for, or 
explained by, the fitted straight line and is calculated as the sum, over all observations, of 
the square of the difference between each fitted value and the overall mean. This equals 
the square of the sum of cross-products between the response and explanatory variate, 
divided by the sum of squares for the explanatory variate; so, we can write this algebra- 
ically as 



ModSS = Y (y - yf = where y,=a+ Px,- . 



Note the similarity between this expression and the TrtSS in Section 4.3.1 for the single- 
factor model; the estimated treatment mean in that case is replaced by the estimated fitted 
value from the regression here. 

Finally, recall from Section 12.2 that the residual sum of squares is calculated as the 
sum, over all observations, of the square of the difference between each observed response 
and its associated fitted value. Alternatively, we can rearrange Equation 12.2 to show that 
the residual sum of squares is equal to the total sum of squares minus the model sum of 
squares, and we can therefore write this algebraically as 
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ResSS = ^{y. - yf = SS„ - . (12.3) 

i=l 

Once any two of the sums of squares are known, fhe fhird can be obtained by rearrange- 
ment of Equafion 12.2. 

As in fhe case of fhe single-factor model, we also need to calculate the amount of infor- 
mafion associated with each sum of squares, quantified by the degrees of freedom. Again, 
fhere is a partition of the total degrees of freedom (TofDF) into a portion associated with 
the SLR (ModDF) and a portion associated with the residual variation (ResDF), such that 

TotDF = ModDF -i- ResDF . 

To calculate the df associated with each component, we use the same recipe developed in 
Section 4.3.1. The individual components in each sum of squares consisf of fiffed values 
from some model minus an adjusfmenf. The df for a sum of squares counfs the num- 
ber of parameters required in the model used to calculate the fitted values, minus the 
number required to calculate the adjustments. In the TotSS, the values are the individual 
observations (requires N parameters) and the adjustment is the overall mean (requires 
one parameter). Hence, the total df is TofDF = N -1. In the ModSS, the values are the fit- 
ted values from the SLR (requires two parameters) and the adjustment is again the overall 
mean (requires one parameter). Hence, the model degrees of freedom are ModDF = 1. And 
finally, in fhe ResSS, fhe values are fhe individual observafions (requires N paramefers) 
and the adjustment is the fitted values from fhe SLR (requires fwo paramefers). Hence, fhe 
residual degrees of freedom are N -2. Alfernafively, fhe ResDF can be found by subtrac- 
fion as 



ResDF = TofDF - ModDF = N -2. 

The model and residual sums of squares are then divided by their corresponding degrees 
of freedom to produce their respective mean squares. These are then on a common scale 
and so quantify the amount of variafion associafed with each component of the model. Of 
particular interest is the residual mean square (ResMS) which is an estimate of the back- 
ground variability, denoted as sf and can be mathematically written as 

2 n IV JO ResSS 
s = ResMS = . 



Intuitively, the ResMS quantifies fhe variafion of all observafions around fhe frue regres- 
sion line and - if fhe model is a good description of fhe dafa - if should arise from back- 
ground variafion alone. The model mean square (ModMS) arises from variafion associafed 
wifh fhe straighf line relafionship. If fhere is no linear dependence of fhe response variable 
on fhe explanatory variable, then the ModMS can arise only from chance background 
variafion, and so should be of similar size to fhe ResMS, allowing for sampling variafion. 
This concepf is formalized by considerafion of the expected values of fhese mean squares. 
The expected value of fhe residual mean square is fhe frue background variafion, of i.e. 



E(ResMS) = a" . 
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The expected value of the model mean square is equal to the true background variation, 
plus the sum of squares for the explanatory variate, SS„, multiplied by the square of the 
true slope parameter, (3, i.e. 



E(ModMS) = a" + p"SS„ . 

If there is no linear dependence of the response variate on the explanatory variate, then the 
true slope parameter is zero (P = 0), and the second term in the expectation of ModMS also 
becomes zero, regardless of the values of the explanatory variate. Both mean squares will 
then have the same expected value. This is the basis for the use of an F-test in the context 
of ANOVA for a SLR, testing the null hypothesis of no linear dependence of the response 
variable on the explanatory variable, formally expressed as Hq: P = 0. This null hypoth- 
esis is compared against an alternative hypothesis of the presence of a linear dependence 
between these variables, namely Hj: P 0, in which case the ModMS is expected to be 
larger than the ResMS. 

We obtain the test statistic by dividing the model mean square by the residual mean 
square, and the quotient is the variance ratio or observed F-statistic, which we denote as 
F, i.e. 



P _ ModMS 
ResMS 

If the null hypothesis is true, we expect the value of the variance ratio to be close to 1. If 
this ratio is larger, so that the variation associated with the regression line is greater than 
the background variation, then this gives evidence that the slope of the line is not zero. 

More formally, under the null hypothesis, such a ratio of two independent mean 
squares has an F-distribution, and the amount of evidence can be quantified. Here, the 
F-distribution numerator df is 1 (ModDF) and the denominator df is N - 2 (ResDF). As 
in the previous chapters, for clarity, we usually specify the observed variance ratio with 
its df as subscripts, for example, Fi ;^,_ 2 . If the observed statistic Fi j ^_2 is larger than the 
100(1 - ajth percentile of this F-distribution, denoted F| n!_ 2 / then the null hypothesis is 
rejected at significance level a^. Equivalently, an observed significance level, P, can be 
calculated as 



P — Prob(Fj f ^_2 > Fj jy,_ 2 ) , 

where Fj ^-2 denotes a random variable with an F-distribution with 1 and N -2 df. All 
the above calculations are conveniently summarized in an ANOVA table, as presented in 
Table 12.2. 



EXAMPLE 12.1C: DIPLOID WHEAT 

Consider again the diploid wheat seed data. Values of SS^,. = 19.2699 and SS^.^ = 330.9297 
were given in Example 12.1B, and the total sum of squares is TotSS = = 7294.4090. The 

model sum of squares can be calculated from the sums of squares and cross-products as 



ModSS = 



SSvv 



(330.9297f 



= 5683.1753 



19.2699 
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TABLE 12.2 

Structure of the ANOVA Table for a SLR 



Source of 
Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio P 


Model 


1 


ModSS 


ModMS = ModSS/1 


F = ModMS/ResMS Prob(Fi > F) 


Residual 


N-2 


ResSS 


ResMS = ResSS/(N-2) 




Total 


N-1 


TotSS 







TABLE 12.3 

ANOVA Table for a SLR Model for Seed Weight with Explanatory Variate Seed Length 
(Example 12.1C) 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


Model 


1 


5683.1753 


5683.1753 


663.117 


< 0.001 


Residual 


188 


1611.2338 


8.5704 






Total 


189 


7294.4090 









By subtraction, we obtain the residual sum of squares as 

ResSS = TotSS - ModSS = 7294.4090 - 5683.1753 = 1611.2338 . 

All these values, together with their corresponding degrees of freedom, are combined 
to give the ANOVA table shown in Table 12.3. 

The observed value of Ej = 663.117 is huge: the 0.1% critical value of the E-distribution 

with 1 and 188 df is = 11.176; so, P < 0.001. Therefore, we have very strong evi- 
dence to reject the null hypothesis that the slope parameter is zero, and we conclude 
that there is a statistically significant linear relationship for seed weight in terms of 
seed length. 



12.4 Properties of the Model Parameters 

Having fitted the SLR model and obtained estimates of the model parameters, we can use 
statistical theory to make further inferences about their underlying, unknown values. If 
the deviations follow a Normal distribution (Assumption 4, Section 12.1), then the esti- 
mates of a and (3 also follow Normal distributions. In each case, the mean of the distribu- 
tion is the unknown population parameter, and for this reason, the estimates are called 
unbiased. Their variances are functions of the explanatory variate, the number of observa- 
tions and the unknown population variance a^. We estimate these variances by replacing 
by its estimate sf the residual mean square (see Section 12.3), giving 



Var(d) 



r 



X 



V 



1 X 



2 'S 



Var(p) 



= s 



2 



X 



f— V 

V SS;CX J 



The estimated standard error of a parameter estimate, SE(), is defined as the square root 
of its estimated variance. 
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It is sometimes of interest to test whether one of fhe paramefers is equal fo a specified 
value, usually zero. Hypofheses of fhis fype can be evaluafed by fhe one-sample f-fesf 
presenfed in Secfion 2.4.1. For example, fo fesf fhe null hypofhesis fhaf fhe slope fakes any 
pre-defined value c, i.e. Hq: (3 = c, we use fhe sfafisfic 



= h- 

SE(p) 



which has a f-disfribufion wifh N -2 degrees of freedom under fhe null hypofhesis. An 
analogous fesf can be consfrucfed for fhe infercepf a. 

The fesf of fhe null hypofhesis fhaf the slope parameter, P, equals zero, i.e. Hq: P = 0, 
against a two-sided alternative hypothesis, i.e. Hj: P 0, is equivalent to the test of no 
linear dependence of fhe response variafe on fhe explanatory variafe againsf fhaf of some 
linear dependence. Thus, fhe f-fesf is equivalenf fo fhe F-fesf obfained from fhe ANOVA 
fable presenfed above, wifh Fi n _2 = (fN- 2 )^- 

The fesf of fhe null hypofhesis fhaf fhe infercepf parameter, a, equals zero, i.e. H(,: a = 0, 
againsf a fwo-sided alfernafive hypofhesis, i.e. Hji a 0, is used fo defermine if fhe model 
passes fhrough the origin, i.e. that the expected value of fhe response variable is zero 
when fhe explanafory variafe is zero. If fhe null hypofhesis is accepted, fhen we mighf fif a 
model fhaf confains only a slope paramefer, alfhough fhis model can have some undesir- 
able properfies which are discussed in Secfion 12.9.2. 

We can calculate fhe 100(1 - aj)% Cl associafed wifh fhese f-fesfs for fhe populafion 
paramefers as 



(d- fwl2^' X SE(d), d -I- fNl2^' X SE(d)) , 
(p- X SE(p), p + ft“lfi X SE(p)) , 



where f n 12^' is fhe 100(1 - ag/2)fh percenfile of a f-disfribufion wifh N-2 degrees of freedom. 



EXAMPLE 12.1D: DIPLOID WHEAT 

Using the summary statistics and parameter estimates obtained in Examples 12.1B and 
12.1C, we can calculate the estimated variances for the intercept and slope parameters as 



Var(d) = X 

Var(p) = X 



1 ( 1 


3.295^ 3 


= 8.5704 X + 




J [l90 


19.2699 J 



= 8.5704 X ( ^ 1 = 0.4448 

U9.2699J 



4.8743 , 



Hence, the estimated standard errors for d and P are 2.2078 and 0.6669, respectively. For 
comparison with the ANOVA, we evaluate the evidence of linear dependence of weight 
on length using a t-test of the null hypothesis Hg: P = 0 against Hy P 0. The observed 
t-statistic is 



t 



N-2 



P 

SE(p) 



17.173 

0.667 



25.751 . 
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TABLE 12.4 



Parameter Estimates with Standard Errors (SEs), t-Statistics (t) and Observed Significance Levels 
(P) for a SLR Model for Seed Weight with Explanatory Variate Seed Length (Example 12.1D) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


-27.931 


2.2078 


-12.651 


< 0.001 


Length 


p 


17.173 


0.6669 


25.751 


< 0.001 



The absolute value of this test statistic is compared with a critical value of the t-dis- 
tribution with 188 df, for example, for a two-sided test with a significance level of 
5%, this critical value is = 1.973 and the null hypothesis is rejected. In fact, the 
test statistic is larger than the 0.1% critical value, tig™®' = 3.343; so, we conclude that 
there is very strong evidence of a linear dependence of weight on length, in agreement 
with the ANOVA in Example 12.1C. As expected, the square of the observed t-statis- 
tic, 25.751^ = 663.117, is equal to the value of the F-statistic obtained in Example 12.1C. 
Properties of the parameter estimates, including their estimated standard errors, the 
t-statistics for testing the null hypothesis that each parameter is equal to zero and the 
associated significance levels are often summarized in a form similar to Table 12.4. 
Variations in this form are commonly produced by statistical software, with parameters 
implicitly identified via the associated explanatory variate. 

Using the information accumulated so far, we can now obtain Cls for both model 
parameters. A 95% Cl for the intercept a is obtained as 

(-27.931 - (1.973 x 2.208), -27.931 + (1.973 x 2.208)) = (-32.286, -23.576) , 
and for the slope parameter, |3, 

(17.173 - (1.973 X 0.667), 17.173 + (1.973 x 0.667)) = (15.858, 18.489) . 

The Cls for the intercept and slope parameters often imply that there is a large set of 
possible fitted lines that are consistent with the data. It is therefore helpful to generate pre- 
dictions and Cls for the fitted response rather than for individual parameters. 



12.5 Using the Fitted Model to Predict Responses 

Once a SLR model has been fitted, we can use the parameter estimates to predict the 
response for a given value of the explanatory variate. In the general case, prediction uses a 
fitted model to estimate the expected response for given values of all explanatory variables 
(Section 1.4). Here, we consider two different forms of prediction for a specified value of 
the explanatory variate, or prediction point, denoted First, we predict the expected 
mean response, denoted p(Xpjj,d)/ which is equal to the fitted response at Xp^j.^ based on the 
observed sample. For clarity, we use the notation |i(Xpred) for the estimate, which is calcu- 
lated as 



M-(^pred) ^ T P -^pred * 
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Second, we predict the value of a new individual response, which we denote ynew(^pred)- 
This prediction is equal to the fitted response at the prediction point, plus the devia- 
tion associated with the new observation, denoted or 



ynew('^pred ) 



d -I- Pxpred + e 



new • 



The new deviation is unobserved and so unknown. If we assume if follows a Normal 
disfribufion wifh zero mean (in line wifh our assumpfions on fhe deviafions), fhen if is 
esfimafed af ifs expecfed value as zero. This predicfion is fherefore equal in value fo fhe 
expecfed mean response given above. However, alfhough fhe presence of fhe deviafion 
does nof affecf fhe value of fhe predicfion, if does influence ifs variance and sfandard error. 

The esfimafed variance of fhe expecfed mean response af fhe predicfion poinf is 
wriffen as 



Var(|i(Xp,ed)) = X 



1 (-^pred X ) 

N ^ SS„ 



,2 ^ 



Again, here, fhe residual mean square, s^, is used as an esfimafe of fhe unknown popula- 
fion variance, a^. The predicfion variance fakes ifs minimum value when Xp,g,j is equal fo 
fhe sample mean of fhe explanatory variate, Xp,ed = x, af which poinf fhe uncerfainfy asso- 
ciated with the fitted line is minimized and the second term (within parentheses) is equal 
to zero. As Xp^^j moves away from fhe sample mean, fhis variance increases as uncerfainfy 
in the fitted response also increases. 

The prediction for a new observafion was wriffen as fhe expecfed response plus a new 
deviafion. The new deviafion is independenf of fhe fiffed line and assumed fo have vari- 
ance equal fo fhe populafion variance, esfimafed by s^. The esfimafed variance of a predic- 
fion for a new observafion is fherefore equal fo fhaf for fhe expecfed mean response plus 
fhe esfimafed variance of fhe new deviafion, which is 

Var(ynew(-^pred)) ® ^ 




Again, fhe minimum variance is obfained when x^^^^ = x, and as Xp,g,j moves away from 
X, the variance increases. The additional variance of fhe new deviafion means fhaf fhis 
will always be greafer fhan fhe variance of fhe expecfed mean response. For bofh fypes of 
predicfion, fhe esfimafed SE is fhe square roof of fhe esfimafed variance. 

Finally, 100(1 - aJTo Cls for fhe fwo fypes of predicfion are obfained as 

(A(^pred) - X ^[A(Ved)]/ A(^pred) + X SE[A(Xp,ed )]) , 

(ynew(-^pred)~ X SE[ynew(-^pred)]/ ynew('^pred) X SE[ynew(-^pred)]j / 

respecfively, where is fhe 100(1 - ag/2)fh percenfile of fhe f-disfribufion wifh N-2 
degrees of freedom (fhe residual df). Figure 12.5 shows bofh fypes of Cls across fhe range 
of observed values of an explanatory variate x. The confidence limifs have been joined fo 
form envelopes, showing fhe Cls af each poinf on fhe fiffed line. Since fhe esfimafed SEs of 
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FIGURE 12.5 

SLR prediction ( — ) with CIs for the mean response |i(x) (- ■ -) and for prediction of a new observation ynew(^) 



the predictions increase as the value of the explanatory variate moves away from its mean, 
the width of the CIs also increases on moving towards the ends of the range of the explana- 
tory variate. Also, as a consequence of the difference between their variances, the CIs for 
fhe expected response are always smaller than those for a new observation. 



EXAMPLE 12.1E: DIPLOID WHEAT 

To get a more helpful measure of the uncertainty in the fitted line for the seed weight 
data, we calculate the 95% CIs for the expected mean responses for seeds with lengths 
equal to the smallest, mean and largest values of the sample. The shortest observed seed 
length is = 2.45 mm, with fitted response 

ii(Xp,ed = 2.45) = -27.931 + (17.173 x 2.45) = 14.144 . 



This prediction has estimated variance 



Var(|i(Xp,ed = 2.45)) = 



N 



(^pred X^ 

ss. 



2 ^ 



8.569 X 



1 ^ (2.45-3.295)"' 

1% 19.268 ' 

V 



0.3628 , 



which corresponds to an estimated standard error of SEfpfXpred = 2.45)) = 0.6023. The 
residual df is 188 (Table 12.3), and calculation of a 95% Cl requires the 97.5th percentile 
of the t-distribution with 188 df, equal to = 1.973. Hence, a 95% Cl for the expected 
mean response at Xp,gj = 2.45 is calculated as 

(14.144 - (1.973 X 0.6023), 14.144 + (1.973 x 0.6023)) = (12.955, 15.335) . 



Similarly, the 95% Cl for the expected mean response at the average seed length 
observed from our sample (i.e. Xpred = x = 3.30 with SE = 0.2124) is 



(28.741 - (1.973 x 0.2124), 28.741 + (1.973 x 0.2124)) = (28.322, 29.160) . 
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Finally, the 95% Cl for_the expected mean response at the longest seed length observed 
(i.e. Xpred = 4.13 with SE = 0.5959) is 

(42.995 - (1.973 x 0.5959), 42.995 + (1.973 x 0.5959)) = (41.819, 44.170) . 



As expected, the Cl for the predicted response at = 3.30 is much narrower (range 
0.84) than that for the prediction at = 2.45 (range 2.37) or at = 4.13 (range 2.35). 
Figure 12.3 shows 95% CIs for the predicted response across the full range of the 
explanatory variate. 



So far, we have made predictions only within the observed range of explanatory variate 
values. This is known as interpolation, and is valid as long as the model fits the observed 
data well (in particular, if there is no evidence of model misspecification; see Chapter 13). 
As we have seen, the uncertainty about the predicted values increases (i.e. the SE increases) 
for prediction points towards the ends of the observed range of the explanatory variate. 
This is inherited from our uncertainty about the slope of the underlying relationship: the 
fitted line must pass through the mean of both variates, so that this point is fixed. A small 
change in the slope can then have a larger impact at the ends of the observed range than 
close to the mean of the explanatory variate. 

Of course, we can also make predictions outside the observed range of the explana- 
tory variate, known as extrapolation. We should be careful when doing this, however, 
because in addition to the increasing uncertainty associated with the fitted line, we have 
no indication of whether the true relationship follows the form of the extrapolated line. 
This is illustrated in Eigure 12.6. Here, the solid circles represent the data used to fit the 
SLR model indicated by the straight line. The open circles represent an additional sample 
for smaller values of the explanatory variate and illustrate the main danger of extrapola- 
tion: when considered over the extended range, the relationship is not a straight line, and 
so, predictions based on the fitted SLR give poor predictions for the smaller values of 
the explanatory variate. The same problems also occur for extrapolation in more complex 
models. Regression should always be regarded as an empirical descriptive technique: the 
fitted model describes the observed relationship between the response and explanatory 




FIGURE 12.6 

Fitted SLR model ( — ) with 95% CIs ( — ) with data used to fit model (•), additional data not used to fit model (o) 
and extrapolation of fitted line ( — ). 
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variables and, in the absence of further information, this description can be valid only 
within the observed range of the explanatory variables. 

The value of the estimated intercept parameter can be interpreted as a prediction of the 
response for a zero value of the explanatory variate, i.e. = 0. If zero is substantially out- 
side the observed range of the explanatory variate, then the intercept may be a substantial 
extrapolation. We might then not be too concerned if the estimated intercept value seems to 
be biologically unrealistic: this merely implies that the fitted model is not valid over a wider 
range. As long as the fit seems plausible over the observed range, this is sufficient for the 
descriptive model to be regarded as adequate. In Section 12.9.1, we centre the explanatory 
variate to estimate the intercept (constant) parameter at a value within the observed range. 



12.6 Summarizing the Fit of the Model 

It is important to check the goodness of fit of a model. This is primarily required to ensure 
that the model provides a reasonable description of the observed data, and the graphical 
procedures described in Chapter 13 are a vital part of this process. However, it is also use- 
ful to have a simple numerical measure to compare different models for the same response 
variable. Goodness-of-fit statistics can be used to compare competing models based on dif- 
ferent explanatory variates, or different transformations of the same explanatory variate 
(as we will see in Chapters 14 and 17). When values of the explanatory variate are repli- 
cated, we can formally test whether the fit of our model is acceptable, and this is discussed 
in Section 12.8. 

Several goodness-of-fit statistics are available, each with advantages and disadvan- 
tages. The most common statistics are the coefficient of determination (R^) and the 
adjusted coefficient of determination (adjusted R^, or Rl^f sometimes expressed as the 
percentage variance accounted for. These statistics are described below. Other statis- 
tics are useful for regression models with several explanatory variates, and these are 
described in Section 14.8. 

The coefficient of determination, here denoted as R^, measures the proportion of varia- 
tion in a data set that is accounted for by the fitted statistical model, calculated as the ratio 
of the model sum of squares to the total sum of squares, i.e. 

^2 _ ModSS 
“ TotSS ■ 

Because of the relationship between the three sums of squares in Equation 12.2, we can 
rewrite this expression in terms of the residual and total sums of squares as 

p 2 _ TotSS - ResSS _ ^ ResSS 
TotSS ~ TotSS ■ 

The coefficient of determination can take any value between 0 and 1, with larger values 
indicating a closer fit of the model to the data. In the context of regression models with 
several explanatory variates, this statistic has the major disadvantage that it does not take 
account of the number of parameters estimated (see Section 14.8). 
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The adjusted coefficient of determination, denoted Radj (or adjusted R^) is an alternative 
goodness-of-fit statistic that takes into consideration both the number of observations, N, 
and the number of estimated parameters. This statistic is calculated as 

2 ResMS _ ResSS/(N ~ 2) _ 2 f 1 ~ -R' 1 

TotMS TotSS/(N - 1) I N - 2 J ' 



where TotMS = TotSS/(N- 1). For SLR, there is a linear relationship between and Radj 
that depends on the number of observations (N) and R^dj takes values between -1/(N - 2) 
(when R2 = 0) and 1 (when R^ = 1), with larger values indicating a better fit. Negative val- 
ues imply an extremely poor fit, with the intercept-only model providing a better fit than 
the regression line. The adjusted statistic takes account of the number of parameters esti- 
mated in the model, and so it can guard against over-fitting, as will be discussed further 
in Section 14.8. Recall that ResMS is a measure of the variance not accounted for by the 
model and TotMS can be considered as a measure of the total variance in the data, and so, 
Radj can be interpreted as the proportion of the variance accounted for by the model. When 
expressed as a percentage rather than a proportion (i.e. 100 x R^dj), this statistic is therefore 
sometimes called the percentage variance accounted for. Within this book, we usually 
report adjusted R^ as a summary measure of fit for a model. 

EXAMPLE 12.1E: DIPLOID WHEAT 

The ANOVA for the SLR model for seed weight with explanatory variate seed length 
was shown in Table 12.3. The coefficient of determination for this model is 



^2 _ ModSS 
” TotSS 



5683.1753 

7294.4090 



0.779 . 



This is a fairly large value, suggesting a reasonable fit of the model to the data, which 
can be verified by checking Figure 12.3, which shows a strong positive linear relation- 
ship between seed weight and length. The adjusted coefficient of determination is 



2 _ ResMS _ 8.5704 

~ TotMS “ 38.5948 



0.778 , 



which is very close to R^ as N -2 = 188 is very large and hence the adjustment 
(1 - R^)/(N - 2) =0.001 is small. The percentage variance in seed weight accounted 
for by the linear regression model using seed length as an explanatory variate is 
therefore 77.8%. 



12.7 Consequences of Uncertainty in the Explanatory Variate 

One of the basic assumptions of the linear model (presented in Section 12.1) is that the 
values of the explanatory variable(s) are known without error. In many cases, this assump- 
tion is not valid, as it is often impossible to ascertain the exact values of an explanatory 
variate. This problem is common in observational studies, in which values of explanatory 
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variates are usually outside the control of the investigator. For example, in Example 12.1, 
all the variates are likely to be subject to some error in their measurements. It is important 
to appreciate the consequences of fhis for fhe fiffed model: if does not make the model 
invalid, but it does change its interpretation. 

When we fit a regression model, we think about fitting a model for fhe response in ferms 
of fhe frue value of fhe explanatory variafe(s). In fhe SLR, uncerfainfy in fhe explanatory 
variafe(s) acfs to attenuate or dilufe a regression relafionship, so fhat fhe esfimafed slope 
fends fo be smaller in size, and fhe esfimafe of background variabilify increases. This 
affenuafion is less if fhe errors in fhe explanatory variafe are small compared wifh fhe 
underlying variabilify of fhe (unobserved) frue values. This is demonsfrafed in Figure 
12.7 for Example 12.1, where fhe original SLR fif for seed weighf as a funcfion of lengfh is 
shown in Figure 12.7a. In Figures 12.7b and c, fhe seed lengfhs have had random Normal 
errors wifh sfandard deviafions of 0.08 and 0.16 mm added fo fhem (25% and 50% of fhe 
sample sfandard deviafion for seed lengfh), and fhe model has been refitted (solid line). 
The original fit is shown by the dashed line, and you can see that as error in the explana- 
tory variate increases, the spread of fhe observafions also increases and fhe slope of fhe 
fiffed line decreases. 

When fhere are errors in fhe explanafory variafe, fhe esfimafed regression line is fhere- 
fore a biased esfimafe of fhe frue relafionship, i.e. fhe underlying relafionship befween fhe 
response and fhe frue value of fhe explanafory variafe. However, fhe esfimafed regression 
line gives a valid estimate of fhe relationship between the response and the explanatory 
variate as measured, i.e. with error. If fhe objecfive is predicfion of fhe response for new 
observafions of fhe explanafory variafe, and if fhese new values will be drawn from fhe 
same population (with the same distribution of error on fhe explanafory variable), fhen fhe 
fiffed line is appropriate. In Example 12.1, our fiffed model will be valid for predicfion of 
seed weighf from lengfh measuremenfs of fhe same type. If our objecfive was predicfion 
of seed lengfh given seed weighf, fhen fhe role of our fwo variables should be reversed 
and a different regression line would be obtained. If fhe objecfive is esfimafion of fhe 
relafionship befween fhe frue values of bofh variables, fhen more advanced fechniques are 
required (e.g. Carroll ef al., 2006). 



(a) (b) (c) 




FIGURE 12.7 

Seed weight plotted against (a) seed length, (b) seed length with Normal(0, 0.08^) errors added, (c) seed length 
with Normal(0, 0.16^) errors added, each with fitted SLR model ( — ) and original SLR fit ( — ). 
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One way to deal with measurement error is to ensure that it is minimized when the data 
are collected. Technical replication (Section 3.1.1) can be useful in this context, with several 
independent measurements of fhe explanatory variate made for each observafion, and fhis 
can reduce uncerfainfy due to bofh sampling variafion and insfrumenf or process variabilify. 
The mean of fhe technical replicates can fhen be used as fhe value of fhe explanatory variate. 

Measuremenf error is less imporfanf in fhe case of a designed experiment when fhe values 
of fhe explanatory variate have been pre-defrned. However, fhese errors may sfill be pres- 
enf. For example, consider a field experimenf wifh pre-defined levels of ferfilizer applicafion 
to be applied to plofs: alfhough operators will do fheir besf to apply fhe required quanfify 
machinery is rarely precise enough to deliver fhis exacfly. Similar problems of precision 
often occur, albeif on a much smaller scale, in laboratory experimenfs. As long as fhe errors 
can be regarded as random (rafher fhan sysfemafic), fhen no bias is inf roduced info esfimafes 
of fhe slope parameter, alfhough fhe esfimafe of background variafion will sfill be inflafed. 



12.8 Using Replication to Test Goodness of Fit 

Replication is a basic principle of the statistical design of experiments and, as discussed 
in Chapters 3 and 4, involves the application of the same treatments to several indepen- 
dent experimental units. Differences between replicates with the same treatment give a 
direct estimate of background variability that arises from uncontrolled variation within 
the experimental process. For this reason, it is useful to have replication present wherever 
possible. In this section, we describe how to use replication to evaluate whether a SLR 
model gives an adequate representation of the observed data. 

In the previous sections, we estimated the background variation directly from the residu- 
als obtained after fitting a regression model. However, these residuals consist of two com- 
ponents that cannot be separated: variation of individual observations about the true but 
unknown trend (often called pure error); and systematic deviation of the fitted trend (here a 
straight line) from the true but unknown trend (commonly called lack of fit, and referred to 
earlier as model misspecification). When each of the values of an explanatory variate is repli- 
cated across different experimental units, these two components can be separated, as shown 
in Figure 12.8. The mean of replicate observations can be regarded as an estimate of the true 




FIGURE 12.8 

Partitioning the deviation into pure error and lack-of-fit components. Fitted SLR for response / ( — ) with obser- 
vations (•), fitted values (o) and observed mean (•) for each level of explanatory variate (x). 
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but unknown trend. Discrepancies between these mean values and the fitted regression line 
can then be used to assess whether the model shows any evidence of lack of fif, and variaf ion 
befween fhe replicafe observafions and fheir mean can be used fo esfimafe pure error. This 
gives fhe basis for an objecfive evaluafion of fhe adequacy of fhe fiffed model. 

EXAMPLE 12.2A: CROP TRANSECT BEETLE COUNTS* 

A pilot study was done to investigate the pattern of an insect pest (beetles) entering a 
susceptible crop. It was suspected that the beetles entered the crop from the edge of 
the field and then progressed towards the centre. One field was surveyed periodically 
and, once the beetles were present in reasonable numbers, a transect was taken from 
the edge towards the centre of the field with samples taken at 2 m intervals. At each 
distance, beetle counts were made from four randomly selected plants, giving replicate 
measurements at each distance. The data are presented in Table 12.5 and can also be 
found in file transect.dat. This file holds the distance into the crop in variate Distance 
and the corresponding beetle counts in variate Count. 

Because of heterogeneity of variances, it is conventional to analyse these counts on the 
logarithmic scale (see Chapter 6). Figure 12.9 shows the logip-transformed counts, which 
suggest that a linear model on the log scale is plausible, but certainly should be checked 
for lack of fit. The variation between observations made on plants at the same distance 
represents pure error, and this variation is substantial. 

TABLE 12.5 



Beetle Counts from Transect Sampling, with Four Plants 
Sampled at Various Distances from the Edge of a Crop 
(Example 12.2A and File transect.dat) 







1 


Plant 

2 3 


4 


Distance from the 


0 


21 


33 


25 


16 


edge of crop (m) 


2 


19 


20 


17 


19 




4 


8 


10 


8 


8 




6 


12 


10 


6 


22 




8 


10 


6 


9 


11 




10 


9 


9 


13 


13 



bO 

O 



1.5 - 


• 


1.4 - 


• 






1.3 - 


• : 


1.2 - 


• 

• 


1.1 - 


• 




• 


1.0 - 


• • • 




• • 


0.9 - 


• 


0.8 - 









T 1 1 1 1 r 
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Distance 



FIGURE 12.9 

Logged beetle counts from replicate plants along a transect into the crop. Distances (m) are measured from the 
edge of the crop (Example 12.2A). 
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To explicitly identify the replicate observations, we must modify our nofafion fo relabel 
fhe unifs. We shall refer fo fhe disfincf values of fhe explanafory variafe as levels, which 
run from 1 fo v. The kth level of fhe explanafory variafe is represenfed as for k=l ... v, 
and fhe number of replicafe observations for fhaf level is denofed as wifh af leasf fwo 
replicafes required for each level, i.e. W/t > 1- The response for fhe Ith replicafe observation 
of fhe kth level of fhe explanafory variafe is represenfed as k=l ... v, 1 = 1 ... The 
fofal number of observations is N = Wj + W 2 + • • • + The SLR model can be written with 
this new labelling as 



yu — oc + p X*: + Ckt , (12-4) 

where is the deviation corresponding to observation yi^j. Note that because we have a 
single value of fhe explanafory variafe for each level k, fhere is no need fo have a second 
index for fhis variable. As usual, fhis model is wriffen in symbolic form as 

Explanatory component: [1] +x 

where [1] is a variate taking value 1 in all units, associated with the intercept parameter 
a, and x holds the values of fhe explanafory variate associated with the slope parameter 
p. This model fits a straight line through the set of observations. To assess lack of fit, we 
want to partition the residual term into a term that fits a separate mean for each level of 
fhe explanafory variafe plus deviations abouf this term. We can do this by fitting a factor, 
denoted facx, in the explanatory model. The factor facx is defined so fhaf differenf levels of 
fhe factor correspond to distinct values of fhe explanafory variate, and can be interpreted 
as a factor version of fhe explanafory variafe. We fherefore add fhis factor to the model, in 
symbolic form, giving 

Explanatory component: [1] +x + facx 
In mathematical form, this can be written as 



ykl — o. + Pxj: + Kj: + Ckl , 



(12.5) 



where represents an effect associated with level k of the explanatory variate, associated 
with factor facx, and eh is used to denote the deviations from fhis more complex model. In 
facf, we have partitioned the model deviations from Equafion 12.4 info a lack-of-fif com- 
ponenf fhaf is common fo each level of the explanatory variable, denoted as % and a pure 
error component comprising the separate individual deviations about this common com- 
ponent, denoted as eh, or 



eu — + eu . 



The above model (Equation 12.5) is over-parameterized, as we have i; -i- 2 parameters to 
describe the pattern across v groups, and we shall discuss the implications of fhis later. 

The ANOVA fable for fhis model is Table 12.6. The model sum of squares, ModSS, is fhe 
same as from a SLR model. The sum of squares associated wifh fhe sysfemafic componenf 
of fhe deviafions, and factor facx, is called fhe lack-of-fit sum of squares, abbreviafed as 
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TABLE 12.6 

Structure of the ANOVA Table for a SLR with Residual SS Partitioned into Lack-of-Fit and Pure 
Error Components 



Source of 
Variation 


df 


Sum of 
Squares 


Mean Square 


Variance Ratio 


P 


Model 


1 


ModSS 


ModMS = ModSS/ 1 


F“ = ModMS/PEMS 


Prob(U,«_„>F'^) 


Residual 












Lack of fit 


V -2 


LoFSS 


LoFMS = LoFSS/ (d - 2) 


Fl = LoFMS/PEMS 


Prob(F,^2,N_„ > Fq 


Pure error 


N -V 


PESS 


PEMS = PESS/(N-d) 






Total 


N-1 


TotSS 









LoFSS. This sum of squares accumulates the squared differences between the fitted value 
from the SLR and the mean of the replicates at each value of the explanatory variate, and 
can be written as 



u nic u 

LoFSS = - ywf = '^n^{yk. -a - , 

k=l 1=1 k=l 



where yu is the SLR fitted value for observation yi^, and yk- is the mean response for the 
explanatory variate value or, in our previous notation. 



yk. = 




The LoFSS has z; - 2 df, as z; parameters are required to generate the mean values fk. 
and two parameters are required to generate the fitted values ya- The remaining varia- 
tion for this model arises from variation between replicates within each level of the 
explanatory variate, and is known as the pure error sum of squares, PESS, which can 
be calculated as 



0 nt 

PESS = ^^{yn - yk.f . 

k=l 1=1 



This sum of squares has N -v df. As usual, mean squares are calculated by division of 
each sum of squares by its df. As the pure error mean square, denoted PEMS, is the best 
estimate of background variation, this quantity is used as the denominator for calculation 
of variance ratios. 

The variance ratio E'^ = ModMS/PEMS is used to test the model via the null hypothesis 
that there is no linear trend in the data, i.e. Hq: (3 = 0. Under this null hypothesis, this vari- 
ance ratio has an E-distribution with 1 and N - z; df. If this variance ratio exceeds the chosen 
critical value of this E-distribution, it indicates the presence of significant linear trend. The 
second variance ratio, E'- = LoEMS/PEMS, is used to test for lack of fit via the null hypoth- 
esis that the deviations from the straight line, in Equation 12.5, are all zero, i.e. Hq: K;, = 0, 
k = l ... V. Under this null hypothesis, the variance ratio has an E-distribution with v-2. 
and N -V di. If this variance ratio exceeds the chosen critical value of this E-distribution, 
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it indicates that deviations of the group means from fhe fiffed sfraighf line are nof all zero, 
and hence fhaf we have some misspecificafion in fhe SLR model. 

EXAMPLE 12.2B: CROP TRANSECT BEETLE COUNTS* 

A SLR model for logip-transformed beetle counts, logCount = logio(Counf), shows a sta- 
tistically significant association with distance into the crop (F = 13.764, P = 0.001) and 
accounts for 35.7% of the variation (adjusted M = 0.357). To investigate the lack of fit, we 
need to define a factor with six levels, one for each distance into the crop, and we call 
this factor fDist (also given in file transect.dat). The model taking account of the lack 
of fit can be written as 

Response variable; logCount 

Explanatory component: [1] + Distance + fDist 

This model accounts for 59.9% of the variation, which is much larger than for the SLR 
model, and the summary ANOVA table is shown in Table 12.7. 

This table has partitioned the residual variation of the SLR (ResMS = 0.0259 with 22 
df) into that associated with the fDist factor, which assesses lack of fit with 4 df, and the 
remainder, associated with variation within distances or pure error with 18 df. The pure 
error estimate of background variation (PEMS = 0.0161) is substantially smaller than 
the SLR estimate; so, the variance ratio for the explanatory variate Distance increases 
(pM _ 22.073, P < 0.001). The lack-of-fit variation associated with the factor fDist is also 
large compared with pure error (F^ = 4.321, P = 0.013), indicating significant deviations 
from the fitted line that require further investigation. Figure 12.10 shows the fitted line 
and group means. 

The pattern of group means shows that there are more beetles within 2 m of the edge 
of the crop, and that the samples within the field (4-10 m from the edge) have smaller 
counts but do not continue to decrease with distance. The counts seem to fall into two 
groups rather than following the straight line required by linear regression, and this 
discrepancy is quantified by the lack-of-fit term. We can conclude that the SLR model is 
not suitable for these data and that further work is required to establish a better model. 



We stated above that the model in Equation 12.5 is over-parameterized, and this 
arises because there are z; -i- 2 parameters (a, (3, Kj ... k„) used to fit means for v groups. 
Individual paramefer esfimafes can fherefore be difficulf fo inferpref alfhough fhe fiffed 
values are equal fo fhe observed group means. If fhere is no evidence of lack of fif, fhen 
fhe SLR model can be fiffed fo obfain fhe usual inferprefable paramefer esfimafes. If 
fhere is evidence of lack of fif, fhen fhe SLR model is nof sufficienf fo explain fhe paffern. 
In fhis case, explanafory model [1] + facx, based on jusf fhe facfor version of fhe explana- 
fory variable, can be fiffed fo give an esfimafe of fhe response for each level fo obfain 



TABLE 12.7 

ANOVA Table for logio(Beetle Counts) Testing for Lack of Fit (Example 12.2B) 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


Model 


1 


0.3562 


0.3562 


F“ = 22.073 


< 0.001 


Residual 












Lack of fit 


4 


0.2789 


0.0697 


pL = 4.321 


0.013 


Pure error 


18 


0.2905 


0.0161 






Total 


23 


0.9256 
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Distance 



FIGURE 12.10 

Data (•), fitted regression line (- 0 -) and group means (•) for logged beetle counts from transect sampling at 
distances (m) from the edge of the crop (Example 12. 2B). 



interpretable parameters. This factor-based model does not allow interpolation between 
levels in the same way as a regression model based on a variate. A better alternative 
would be to fit a regression model that follows fhe observed trend, perhaps a curved or 
non-linear model as discussed in Chapter 17. 

Lack of fit can also be examined in more complex models, such as polynomial models 
(Section 17.1), multiple regression (Chapter 14) or regression with groups (Chapter 15). In 
each context, the technique for assessing lack of fit is the same: to add a factor version of 
fhe explanatory variable into the model everywhere that the variate version appears; this 
is illustrated in Example 18.4. Again, this should be done only where each value of the 
explanatory variable is replicated. 

The advantage of using replication within the context of simple or multiple linear regres- 
sion should now be clear: it allows for a quantifative assessment of model misspecification 
in addition to more subjective assessments based on residual plots. Unfortunately, it is 
possible to use replication only where the values of the explanatory variate are under the 
control of or can be chosen by the experimenter, which does not apply to most observa- 
tional studies. 



12.9 Variations on the Model 

12.9.1 Centering and Scaling the Explanatory Variate 

Centering is a simple transformation of a variafe thaf subtracts the sample mean from all 
observafions so thaf fhe transformed values have mean zero, i.e. they are centered about 
zero. The variance and standard deviation of the centered variate remain identical to that 
of fhe original variafe. If zero is outside the range of the uncentered explanatory vari- 
ate, this transformation can make the intercept parameter more easily interpretable (see 
Section 12.5). The centered model is written mathematically as 

1 /; = a* + P(x,- -x) + €i , 
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which replaces the values of the explanatory variate with those same values after subtract- 
ing their sample mean, x. This model can be fitted by the use of a new variate defined as 
x'i = Xi - X and by our rewriting the model as 

1/,- = a‘ -I- Px’ + 6i . 

The intercept parameter, now relabelled as a*, is the predicted response at value zero of 
the centered explanatory variate, which is equal to the sample mean of the uncentered 
explanatory variate. Hence, the estimated intercept parameter is equal to the sample mean 
of the response, a‘= y, since the fitted SLR line passes through the point [y ,x) for any 
parameterization of the model. The estimated slope is unchanged. 

As an alternative to centering, the explanatory variate may be standardized by subtrac- 
tion of the sample mean from each observation, and then division of this by the sample 
standard deviation, s^, as 



.. Xi - X 

X, = . 

Sx 

The SLR model is then rewritten in terms of the standardized variate as 

y, = a’ -I- P*x,‘* + Si ■ 

The intercept here again represents the predicted response at the sample mean of the 
explanatory variate, and the slope parameter, relabelled as (3*, represents the change in the 
response for 1 unit change in the standardized explanatory variate, which is equal to a 
change of 1 standard deviation in the original explanatory variate. This form of the model 
is most useful when there are several explanatory variates (see Chapter 14) with very dif- 
ferent scales: using standardized variates makes the slope coefficients directly comparable 
across different explanatory variates. For interpretation, it is often helpful to translate the 
slope parameter for the standardized explanatory variate back into the units of the origi- 
nal variate. This is done with the relationships 

p=sj‘; SE(p) = s,SE(p‘), 



i.e. the estimated slope in terms of the original units is equal to the estimated slope for the 
standardized variate multiplied by the unbiased sample standard deviation for the origi- 
nal explanatory variate. The estimated standard error is similarly scaled by the unbiased 
sample standard deviation. 



12.9.2 Regression through the Origin 

We might also consider a restricted form of the SLR model with the intercept parameter 
set to zero, so that the response must be equal to zero when the explanatory variate is zero. 
This model may arise in two different ways. In some circumstances, it may be asserted 
from prior knowledge or expectations. This may be a reasonable biological assumption to 
make, for example in an early-growth experiment where one might expect zero biomass to 
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correspond to zero shoot length. In other cases, the model may arise from interpretation 
of fhe resulfs of a SLR, when fhe fiffed model suggesfs fhaf zero is a plausible value for 
fhe infercepf paramefer or, more specifically, when fhe null hypofhesis fhaf fhe infercepf 
paramefer is equal fo zero, i.e. Hq: a = 0, cannof be rejecfed. If zero is wifhin, or close fo, fhe 
range of fhe explanatory variate, and fhe overall pattern is clearly consisfenf wifh a zero 
infercepf, fhen if may be sensible fo omif fhe infercepf paramefer if if is nof sfafisfically 
significanf. If zero is well oufside fhe range, fhen we can only infer fhaf fhe infercepf of a 
SLR model should be zero if if is also reasonable fo assume fhaf a common linear relafion- 
ship holds from zero up fo fhe full observed range of fhe explanatory variate. Graphical 
diagnosfics (see Chapfer 13) should always be used fo ensure fhaf omiffing fhe infercepf 
has nof infroduced bias info fhe fiffed model. 

The new model, wifh a = 0, is called regression through the origin, and takes the form 

])i = Px,- + e, , 

where each ferm is described in Secfion 12.1. As wifh any ofher linear model, Assumpfions 
1 fo 5 of Secfion 12.1 also apply here. We can wrife fhis model using symbolic nofafion by 
omiffing fhe consfanf ferm and specifying fhe explanafory variate alone as 

Explanatory componenf: x 

To esfimafe fhe slope paramefer in fhis model, we musf define some new sfafisfics. The 
uncorrecfed sums of squares for fhe response (USSy,,) and explanafory variable (USS^J and 
fhe uncorrecfed sum of cross-producfs (USS„,) are defined as 

N N N 

USSyy = USS„ = USS:,y = ^X,y, . 

!=1 1=1 i=l 

These quanfifies fake a similar form fo fhe sums of squares and cross-producfs defined 
earlier as SSyy, and SS^^, buf here we do nof 'correcf' the variables by subtraction of 
fheir sample means. 

The leasf-squares esfimafe of fhe slope, P, for regression through the origin becomes 

n _ USS,y 

uss,, ■ 

This takes a similar form fo fhe slope esfimafe from fhe sfandard SLR model, buf here, fhe 
uncorrecfed sums of squares are used in place of fhe correcfed sums of squares. 

The consfrucfion of fhe ANOVA fable for fhis simplified model is also based on a par- 
fifion of fhe fofal sum of squares as in Equafion 12.2; however, fhe calculafions are now 
based on uncorrecfed rafher fhan correcfed sums of squares. We modify fhe nofafion fo 
reflecf fhis change, wifh fhe uncorrecfed fofal (TofUSS), model (ModUSS) and residual 
(ResUSS) sums of squares defined as 



TofUSS = USSyy; 



ModUSS = 



(USS,y)^ _ 

uss„ ' 



ResUSS = TofUSS - ModUSS . 
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TABLE 12.8 



Structure of the ANOVA Table for Regression through the Origin 



Source of 
Variation 


df 


Sum of 
Squares 


Mean Square 


Variance Ratio P 


Model 


1 


ModUSS 


ModUMS = ModUSS/1 


F = ModUMS/ResUMS Prob(Fi,„_i > F) 


Residual 


N-1 


ResUSS 


ResUMS = ResUSS/(N - 1) 




Total 


N 


TotUSS 







Since we have used uncorrected sums of squares, the degrees of freedom for each term 
must account for this. There is now no adjustment term present in the total or model sums 
of squares, so that their degrees of freedom are TotUDF = N and, as the regression model 
has one parameter, ModUDF = 1. The residual df can be obtained by subtraction as 

ResUDF = TotUDF - ModUDF = N-1. 



As usual, the sums of squares are divided by their degrees of freedom to form mean 
squares, and fhe structure of fhe ANOVA table is shown in Table 12.8. 

The observed F-statistic, calculated as the model mean square divided by the residual 
mean square, can be used to evaluate the null hypothesis Hq: (3 = 0 against the alterna- 
tive hypothesis Ftii (3 0. Under the null hypothesis, this statistic follows an F-distribution 

with 1 and N -1 df. The esfimated slope parameter has an expected value equal to the 
unknown true value, (3, with estimated variance 



Var(P) = X 




As before, fhe background variafion is esfimafed from the residual mean square, with 

s2 = ResUMS = ResUSS/(N - 1) . 



Calculations of CIs for the estimated slope then follow fhe procedure in Section 12.4, but 
using ResUDF = N - 1 to determine the appropriate t-distribution. 

The fit of the regression through the origin can again be assessed by goodness-of-fit 
statistics. A modified version of the coefficient of determination, known as the empirical 
coefficient of determination and denoted as R^^p/ is required and is defined as 



R 



2 

emp 



ResUSS 

TofSS 



This sfafisfic contrasfs the uncorrected residual sum of squares to the corrected total 
sum of squares and can be compared wifh R^ for models wifh an intercepf, as both statis- 
tics have the same denominator. Flowever, because the comparison is no longer related to 
a single partition of fhe total variation, Rgmp can now take negative values. 

EXAMPLE 12.3: AIR TEMPERATURE 

Measurements of air temperature (°C) were made at approximately 9 a.m. on 100 days 
during 2006 (N = 100) with a standard glass mercury dry bulb thermometer and a new 
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TABLE 12.9 

First Four and Last Four Measurements of Air Temperature 
(°C) Made during 2006 Using a Standard Glass Mercury 
Thermometer and a New Electronic Thermistor (Example 
12.3, Full Data in File airtemp.dat and Table A.2) 



Day Number 


Mercury 


Thermistor 


1 


5.3 


5.3 


7 


6.6 


5.5 


8 


00 


8.7 


13 


6.9 


6.7 


344 


10.6 


10.4 


349 


4.9 


4.0 


351 


-1.5 


-2.4 


353 


0 


-3.2 



Source: Data from T. Scott and M. Glendining, Rothamsted Research. 



electronic dry bulb thermistor probe. A subset of the data is presented in Table 12.9, 
with the full set shown in Table A.2 and in file airtemp.dat. 

The aim of the analysis is to model the new thermistor measurements (response 
variate Thermistor) in terms of the standard mercury measurements (explanatory 
variate Mercury) to investigate the relationship between them. It is of interest to 
determine whether the measurements are equivalent, i.e. whether a line that passes 
through the origin (thermistor reads zero when mercury reads zero) with slope equal 
to 1 is a plausible model for these observations. The measurements range between 
-3.2°C and 28.4°C and the scatter of points does appear to pass through the origin (see 
Figure 12.11). 

We start by fitting the SLR model (Equation 12.1), which accounts for 98.6% of the vari- 
ation in the thermistor measurements (adjusted M = 0.986), reflecting the very strong 




FIGURE 12.11 

Scatter plot of air temperature measurements (°C) made by a standard glass mercury dry bulb thermometer and 
a new electronic dry bulb thermistor (Example 12.3). 
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FIGURE 12.12 

Air temperature measurements (°C) with fitted line from regression through the origin (Example 12.3). 

linear relationship between the two sets of measurements. This model gives parameter 
estimates P = 0.996 (SE = 0.0120), a = -0.262 (SE = 0.1625) and the fitted model 

thermistor i = -0.262 + 0.996 Mercury i . 



Eirst, we examine the intercept. The t-statistic for testing the null hypothesis Hq: a = 0 is 
t = d/SE(a) = -0.262/0.1625 = -1.613 , 

with 98 df. The observed significance level for this test is P = 0.110; so, there is no evi- 
dence that the estimated intercept is different from zero and it appears reasonable to 
drop this parameter and fit a regression through the origin. This model estimates the 
slope as P = 0.979 (SE = 0.0059). The fitted line is shown in Eigure 12.12, and the associ- 
ated ANOVA table is Table 12.10. 

As in the SLR, there is strong evidence that the estimated slope is not equal to zero. 
However, here, we are more interested in whether the slope is equal to 1, corresponding 
to the null hypothesis Hq: P = 1. A t-statistic for testing this hypothesis against the two- 
sided alternative Hp p 1 can be calculated as 

t = (P - 1)/SE(P) = (0.979 - l)/0.0059 = -3.597 , 

with 99 df. The observed significance level for this test is P < 0.001; so, there is strong 
evidence that the slope is not equal to 1 and we reject Hg. Residual plots for this model 
based on standardized residuals are shown in Figure 12.13. The fitted values plot shows 



TABLE 12.10 



ANOVA Table for Regression through the Origin for Thermistor Readings with Mercury Readings 
as the Explanatory Variate (Example 12.3) 



Source of Variation 


df 


Sum of Squares 


Mean Square 


Variance Ratio 


P 


Model 


1 


17,698.267 


17,698.267 


27,397.119 


< 0.001 


Residual 


99 


63.953 


0.646 






Total 


100 


17,762.220 
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Fitted value 




Standardized residual 




Fitted value 




Normal quantile 



FIGURE 12.13 

Composite set of residual plots based on standardized residuals from regression through the origin for air 
temperature measurements (Example 12.3). 



a clear trend, with residuals tending to be negative for the lowest and highest tem- 
peratures, and positive for intermediate values. This suggests some non-linearity in the 
relationship which is also apparent in Figure 12.12 (and we discuss such patterns in the 
residual plots in more detail in Section 13.3). There are also two very large residuals, one 
at each end of fhe temperature range. 

Putting all these results together suggests that there is not a 1:1 relationship between the 
two types of measurement, and that the thermistor measurements can markedly deviate 
from the traditional method for temperatures around or below 0°C or above 25°C. 

Whether the thermistor can be used in practice as a substitute for the mercury read- 
ings may depend on the context. If fhe mercury measurement is regarded as a gold 
standard to be replicated, then the thermistor readings are clearly not adequate. On the 
other hand, if the required accuracy is less, and the likely range of use is within 5-20°C, 
then the small (although statistically significant) difference from the 1:1 relationship 
might not be of practical importance and thermistor readings may be acceptable (see 
Section 4.4 for a discussion of biological vs. statistical significance). 



One case of regression through the origin where additional care may be required 
occurs when the origin represents some initial or control condition. Observations made 
at, or very close to, the origin may then show little or no variation. For example, consider 
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a zero dose of inoculum in a disease-control experiment. The zero dose here acts both as 
a negative control (in the sense of Section 3.1) and as an initial condition, and should con- 
sistently result in no symptoms present. The zero background variation at the zero dose 
will be inconsistent with natural variation found for positive doses, and so, the assump- 
tion of a common variance across all deviations (Assumption 2, Section 12.1) does not 
hold (also see the discussion in Section 8.5). One solution is to exclude the initial/con- 
trol condition (to avoid introducing bias into the estimate of background variation) and 
constrain the model to pass through the origin. Alternatively, if data are recorded as 
proportions or counts, then a GLM that accommodates such heterogeneity may be fitted 
(Chapter 18). 

12.9.3 Calibration 

The process of calibration, sometimes also called inverse regression or inverse predic- 
tion, is required when scientists wish to use a quick or easy procedure to estimate a 
quantity that is hard to measure directly. For example, in many laboratory procedures, 
a target molecule can be labelled with a dye, and then light absorbance by the dye can 
be directly related to the quantity in a sample. We will call the variable of interest the 
target and the variable to be measured the substitute variable. The calibration proce- 
dure uses known quantities of the target variable that span the range of interest (usu- 
ally with replication) and measure the outcome in terms of the substitute variable. A 
regression model is then fitted with the substitute variable (which is subject to error) 
as the response and the target variable (which uses known quantities) as the explana- 
tory variable. Here, we assume that the relationship is linear, but a similar procedure 
can also be followed for curved or non-linear models. Calibration will only be accurate 
if the regression relationship is a good fit, i.e. with adjusted close to 1, and with no 
evidence of model misspecification. The fitted model is then used to make predictions 
with confidence limits for the target variable given new measurements of the substitute 
variable. In non-mathematical terms, this process derives a range of plausible values 
for the target variable from the CIs for the fitted line, as shown in Figure 12.14 (also see 
Draper and Smith, 1998, Section 3.2). 




FIGURE 12.14 

Inverse prediction for observed data (•) with fitted regression line ( — ) with 95% CIs for a new observation ( — ). 
For new measurement y„^„, the predicted value of the explanatory variable, occurs where the fitted line 
equals y„^„, with 95% confidence limits, and x^pp, obtained as the points at which the CIs equal 
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Suppose a calibration process leads to the fitted SLR model with predictions in the form 

y(Xnew) OC + PXnew / 

wifh esfimafed background variance s^. If we have a new sample and fake r independenf 
measuremenfs from if (fechnical replicafes), fhen fhe mean of fhese values, denofed ynew/ 
gives us an esfimafe of fhe frue value of fhe subsfifufe variable, wifh associafed error s^/r. 
We assume fhaf fhe process used fo generafe new measuremenfs, and hence fhe associafed 
errors, is fhe same one used fo consfrucf fhe calibrafion line. We can plug fhis new value 
info fhe predicfion formula and rearrange if fo gef an esfimafe of fhe fargef variable as 

- _ ynew - g 

Anew / 

P 



buf we also need some measure of uncerfainfy in fhis esfimafe. The variance associafed 
wifh a predicfion for a mean of r observafions af value x fakes fhe form 



Var( ynew(^ )) = x 



1 1 (x - x) 

- + — + ^ ^ 
r N SSvv 



2 A 



and confidence limifs can be formed from fhis value (as described in Secfion 12.5) for a 
range of values of x. Fieller's fheorem shows fhaf fhe values of x af which fhe 100(1 - aj% 
upper and lower confidence limifs equal fhe value ynew give lower and upper 100(1 - aJTo 
confidence limifs for fhe esfimafe Xnew- These limifs are shown in Figure 12.14 and fheir 
mafhemafical formula is 



y(Xnew - ’ l (Xnew ~ xf | ^ 1 

1-^ “P(1-^)V SS„ ^\r nJ' 



where g 



( tIas/2] V 

rw-2 

vP/^(P), 



These limifs only exisf when g<l, which occurs when fhe f-fesf for fhe null hypofhesis 
H(,: (3 = 0 exceeds fhe crifical value fw!.^^' (see Secfion 12.4). 

In building fhe calibrafion curve, if is imporfanf fhaf fhe quanfifies of fhe fargef variable 
are known wifhouf error; if fhese values are also subjecf fo uncerfainfy, fhen fhe fiffed rela- 
fionship will be subjecf fo affenuafion (see Secfion 12.7) and if would be beffer fo regress 
fhe fargef on fhe subsfifufe variable and direcfly predicf from fhaf relafionship. 

EXERCISES 

12.1* An experimenf was conducfed fo quanfify fhe growfh rafe of fransplanfed cab- 
bage planfs. Forfy cabbage planfs were fransplanfed and four planfs (chosen 
af random) were desfrucfively sampled and fhe number of leaves presenf was 
counfed on fhe day fhey were fransplanfed and 8, 14, 21, 28, 35, 37, 42, 44 and 46 
days afterwards. File meancabbage.dat confains sample numbers (ID), sample 
fimes (variafe Day) and fhe mean number of leaves per plan! af each sample 
(variafe NLeaves). Fif a SLR and verify fhe esfimafes of fhe slope and infercepf by 
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calculating the relevant sums of squares. Plot the fitted model and comment on 
the quality of fhe fif. Give a 95% Cl for fhe average growfh rafe over fhe period 
(as leaves per plan! per day). (We re-visif fhese dafa in Exercises 13.2 and 18.2.) 

12.2 The Rofhamsfed Insecf Survey collecfs insecfs using 12.2 m sucfion fraps af 
locafions across fhe Unifed Kingdom. As parf of an invesfigafion info long- 
ferm changes in fhe abundance of flying insecfs, indices of fhe fofal biomass 
collecfed per year (measured as wef weighf) were creafed for 30 years from 
1973 fo 2002 for four locafions (Shorfall ef al., 2009). The wef weighfs (variafe 
WetWeight, g) collecfed from fhe Hereford frap in each year (variafe Year) are 
held in file hereford.dat. Use a SLR fo invesfigafe whefher fhere is evidence 
of any linear frend over fime in fhe log-fransformed wef weighfs, calculafed 
as logio(Wefl/l/e/ghf + 0.5), and summarize fhe sfrengfh of fhe relafionship. Use 
fhis model fo predicf fhe expecfed wef weighf in 2010, and commenf on fhe reli- 
abilify of fhis predicfion. Plof fhe fiffed model and consider whefher fhere are 
any aspecfs of fhe fif fhaf you would wish fo examine furfher. (We re-visif fhese 
dafa in Exercises 13.4 and 15.1.)* * 

12.3* A pilof sfudy invesfigafed whefher measuremenfs of leaf lengfh and widfh made 
in fhe field could be used fo accurafely esfimafe leaf area. Twenfy-five planfs were 
chosen af random from a plof of a single variefy and fhe lengfh (cm) and widfh 
(cm) of fhe flag leaf on each planf was measured in situ. These leaves were fhen 
defached from fhe planfs and fheir area was measured (cm^) using imaging soff- 
ware. The dafa (variafes Leaf, Length, Width, Area) are in file flagleaf.dat. Use 
SLR fo explore fhe relafionship befween leaf area and ifs esfimafe consfrucfed as 
lengfh X widfh. Build and reporf a predicfive model for leaf area, and crifically 
assess ifs performance. In principle, we would expecf leaves wifh zero lengfh 
or widfh fo have zero area; so, does if make sense fo fif regression fhrough fhe 
origin in fhis confexf? 

12.4 An experimenf was conducfed fo idenfify variefies of willow wifh high yields 
of dry maffer. However, as accurafe measuremenf of dry maffer is fime consum- 
ing, the use of a surrogafe variable is desirable and several such variables were 
measured on a sample of 113 frees. Eile willowstems.dat holds fhe values of dry 
maffer (variafe DryMatter) and several summary variables, including fhe lengfh 
of fhe longesf sfem (variafe MaxLength), which is fhe simplesf fo measure. Eif a 
SLR relafing dry maffer fo fhe maximum sfem lengfh - could we reasonably use 
fhis as a surrogafe variable? (We re-visif fhese dafa in Exercises 13.5 and 14.5.)^ 

12.5 The Rofhamsfed Insecf Survey provided body mass (mg) and wing lengfh (mm) 
measuremenfs for a sample of mofhs from fhe Noctuidae family caughf in a 
mercury-vapour frap af Rofhamsfed befween 1999 and 2001 (Wood ef al., 2009). 
These dafa are held in file noctuid.dat and include unif numbers (ID), species 
name (facfor Species), wing lengfh (variafe WingLength) and body mass (variafe 
Mass) for each mofh. The aim of fhis analysis is fo predicf body mass from wing 
lengfh. Use SLR fo invesfigafe fhe relafionship befween logig(/Wass) and wing 
lengfh. As wing lengfhs were measured fo fhe nearesf mm, and fhere are several 
observafions af each disf incf value of wing lengfh, creafe a facfor fo fesf your SLR 



Data from R. Harrington and C. Shortall, Rothamsted Research. 

* Data from I. Shield, Rothamsted Research. 
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for any evidence of lack of fif. Whaf do you conclude? Whaf ofher invesfigafions, 
if any, would you like fo make? (We re-visif fhese dafa in Exercise 15.2.)* * 

12.6 A microarray sfudy invesfigafed genes associafed wifh fhe senescence of leaves. 
Forfy-four planfs were grown in a confrolled environmenf and fhe sevenfh 
leaf was excised from four of fhese planfs af 2-day infervals from 19 fo 39 days 
affer sowing (af fhe same poinf in fhe day/nighf cycle each fime). The planfs 
were allocafed fo sample dales af random, wifh a CRD design. Four subsam- 
ples (fechnical replicafes) were faken from each leaf and allocafed fo separafe 
microarrays. File senescence.dat holds unif numbers (ID), design informafion 
(variafe Day, factor Biol Rep) and fhe expression value for fhree genes (variates 
CATMA3A13560, CATMA2A31585 and CATMA1A09000) from each plan! fol- 
lowing normalizafion and combinafion of fhe values for fhe four fechnical rep- 
licafes. Use SLR fo predicf fhe expression of gene CATMA3A13560 over fime. Is 
fhere any evidence of lack of fif fo fhis relafionship? (We re-visif fhese dafa in 
Exercises 13.1 and 17.2.)+ 



Data from J. Chapman, Rothamsted Research. 

* Data from V. Buchanan-Wollaston (PRESTA), University of Warwick. 
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In Chapter 5, residual plots were used to investigate the assumptions underlying the linear 
model for the case of a single qualifafive variable (factor). As discussed and briefly dem- 
onsfrafed in Chapter 12, all of fhese residual plofs are also applicable to a model wifh a 
single quanfifafive variable or variate (i.e. SLR), as well as to fhe more complex regression 
models such as fhose described in Chapters 14 fo 18. However, in models wifh quanfifafive 
explanafory variables, fhe addifional quesfion arises as fo whefher a linear frend is a good 
descripfion of fhe relafionship, and several diagnosfic plofs can be used fo invesfigafe fhis. 
Concepfs such as influence and leverage can also help fo examine fhe impacf of individual 
poinfs on fhe fiffed line, and cross-validafion techniques can quanfify fhe predicfive power 
of fhe model. In fhis chapter, we infroduce fhese concepfs, and review and infroduce some 
techniques for checking fhe fif of regression models. 

We sfarf by considering fhe problem of model misspecificafion (Secfion 13.1) and for- 
mally define some residual plofs fhaf were presenfed in Chapfer 12. We fhen review fhe 
differenf fypes of residuals firsf described in Secfion 5.1 and infroduce fwo new fypes, 
predicfion and delefion residuals, fhaf are parficularly helpful in fhe confexf of regression 
(Secfion 13.2). The use of residual plofs in regression is fhen considered (Secfion 13.3), wifh 
parficular reference fo checking for model misspecificafion. The concepfs of leverage and 
influence are fhen defined and discussed (Secfion 13.4). Finally, some simple cross-valida- 
fion fechniques are infroduced (Secfion 13.5). 



13.1 Checking the Form of the Model 

The form of a SLR model asserfs fhaf fhe response changes as a sfraighf line funcfion of fhe 
explanafory variate. Any sysfemafic deviafion from fhis form implies fhaf fhe SLR model 
is nof appropriate, or fhaf fhe model has been misspecified. The firsf sfep in any regression 
should fheretore consisf of ploffing fhe observafions, i.e. fhe values of fhe response variate 
againsf fhe values of fhe intended explanafory variafe, fo check whefher a linear response 
is plausible, as in Figure 12.2. If fhe relafionship is clearly curved, fhen fransformafion 
of fhe explanafory variafe or a non-linear model should be considered (furfher defails in 
Chapfer 17). If fhe relafionship appears linear, fhen fhe model filling may proceed and we 
are fhen in a posifion fo check fhe qualify of our fiffed model using numerical fools (such 
as goodness-of-fif sfafisfics, Secfion 12.6) and diagnosfic plofs. The fitted model plot, in 
which the fitted model is superimposed on a plot of fhe observafions, can be used fo defecf 
model misspecificafion: if fhe dafa follow fhe form described by fhe model, fhen fhe fiffed 
line should reflecf fhe frend in fhe observafions across fhe full range of fhe explanafory 
variafe. Examples are shown for Example 12.1 in Eigure 12.3 and for Example 12.3 in Eigure 
12.12. In bofh, fhere is a suggesfion fhaf fhe model does nof fif well af fhe ends of fhe range: 
in Example 12.1, fhe model appears fo under-esfimafe fhe smallesf and largesf weighfs; in 
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Example 12.3, the model appears to over-estimate the smallest and largest temperatures. 
The fitted value plot, in which the standardized residuals are plotted against the fitted val- 
ues (see Section 5.2 and below) can be used to investigate this pattern after removal of the 
overall trend. In the SLR model, this is equivalent to plotting the standardized residuals 
directly against the explanatory variate. This type of graph emphasizes deviations from 
the fitted model, as can be seen in Figures 12.4 and 12.13 for Examples 12.1 and 12.3, respec- 
tively. These graphs show clear curvature in the pattern of the residuals and hence some 
evidence of model misspecification. A more complex form of model might be investigated 
(see Chapter 17), especially if good prediction is required at the extremes of the explana- 
tory variate. However, some common sense is also required. In both cases, the deviations 
from the fitted model are relatively small, and the model accounts for most of the variation 
in the response (with adjusted of 0.778 and 0.986 for Examples 12.1 and 12.3, respec- 
tively), and so, the SLR might be deemed adequate as a simple descriptive model. This is 
not the case in the next example. 



EXAMPLE 13.1: ELISA ABSORBANCE READINGS 

A set of eight ELISA readings were obtained for a series of increasing concentrations 
of a substrate. Here, the aim of the analysis is to describe the relationship between the 
absorbance reading and substrate concentration. The data are presented in Table 13.1 
and can be found in file elisa.dat. 

The absorbance is expected to be related to a power of the concentration and is hence 
approximately linearly related to the logarithm of the concentration. For convenience, we 
use the logio-transformation, with an offset (+1) included so that the background absorbance 
(zero concentration) can be included in the model. However, the relationship between 
absorbance and the logig-concentration is clearly not linear, as shown in Figure 13.1. 

We already suspect that a SLR model is not appropriate for the data but, for the 
purposes of demonstration, we proceed with the analysis. The model in mathemati- 
cal form is 



Absorbance i = a + (3 logic (Cone, + 1) + e, , 

where the units are labelled with index ; = 1 ... 8 with Absorbance^ and Conci being the 
observed absorbance and concentration for the ith observation. If variate Absorbance 



TABLE 13.1 

ELISA Readings (Absorbance) Obtained for 
Different Concentrations of a Substrate 
(Example 13.1 and File elisa.dat) 



Concentration 


Absorbance 


0 


0.100 


0.5 


0.678 


1 


1.107 


2 


1.609 


4 


1.958 


8 


2.202 


16 


2.414 


32 


2.485 



Source: Data from Rothamsted Research. 
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0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 

logjf,(concentration + 1) 



FIGURE 13.1 

ELISA absorbance readings plotted against logjQ(concentration + 1) of a substrate (Example 13.1). 



contains the ELISA absorbance readings, and variate logWConc contains the logiQ- 
transformed concentrations, then this model can be written in symbolic form as 

Response variable: Absorbance 

Explanatory component: [1] + loglOConc 

The fitted model accounts for 85.8% of the variation in the data (adjusted M = 0.858), 
with the parameter estimates listed in Table 13.2. The slope parameter is clearly signifi- 
cantly different from zero (t^ = 6.575, P < 0.001), indicating a strong linear trend in the 
absorbance response with changes in the logig-concentration. If we were to stop here in 
our investigation, we might think that this was a good representation of the data. But 
when we see the fitted model and fitted value plots in Eigure 13.2, we realize that the 
straight line fits the data poorly. Although there is a trend present, in the sense that the 
absorbance reading increases with logiQ-concentration, there is also strong curvature in 
the relationship. Another model that accounts for the curvature must be sought for fhese 
data, and several possibilities are described in Chapter 17. 

It can also be helpful to plot the observed response against the fitted values, particularly 
for the models with several explanatory variates considered in Chapter 14. Departures 
from the 1:1 line then indicate a poor fit of the model to the data. We might be tempted to 
assess this formally by fitting a regression through the origin to this representation, but 
the model is measured better by the goodness-of-fit statistics described in Sections 12.6 
and 14.8. An alternative approach is cross-validation as described in Section 13.5. 



TABLE 13.2 



Parameter Estimates with Standard Errors (SEs), t-Statistics (t) and Observed Significance Levels 
(P) for a SLR Model for ELISA Absorbance Readings in Terms of Logip-Concentration of Substrate 
(Example 13.1) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


0.5458 


0.19392 


2.815 


0.031 


loglOConc 


p 


1.5284 


0.23244 


6.575 


< 0.001 



328 



Statistical Methods in Biology 




logjQ(concentration + 1) 



(b) 

■a 

3 

■a 



1.0 

0.5 

0.0 

-0.5 

- 1.0 

-1.5 




0.5 1.0 1.5 2.0 2.5 



Fitted value 



FIGURE 13.2 

(a) Fitted model and (b) fitted values plot from SLR model for ELISA absorbance readings with 
logio(concentration + 1) as the explanatory variate (Example 13.1). 



In designed experiments, the values of the explanatory variable can be controlled and 
are often replicated. It is then possible to formally invesfigafe model misspecificafion using 
fhe fesf for lack of fif infroduced in Secfion 12.8. 



13.2 More Ways of Estimating Deviations 

In Section 5.1, we presented the simple and standardized residuals as estimates of fhe 
unknown deviations. These can be used to evaluate the validity of fhe underlying assump- 
fions abouf the distribution of the deviations. We revisit these definitions here in the con- 
text of a SLR model, and we also infroduce some new types of residuals fhaf are useful for 
models with quantitative explanatory variables. 

Recall from Secfion 5.1 fhaf fhe simple residuals, e„ are defined as fhe difference between 
the observed responses and their fitted values. For the SLR model with 

y, = a -F P Xi + 6i , 

this takes the form 

e, = Vi-h = y,-(d + pXi), 

which is fhe difference befween fhe response and fhe fiffed sfraighf line. The simple resid- 
uals from a SLR model do nof have a common variance, as fhe variance of residual e, 
depends on fhe value of fhe explanatory variafe, x,. For fhis reason, if is imporfanf to use 
standardized residuals, r„ for regression models, which are defined as 



n 



h 

sE(a,) ' 



where SE(d ) is the estimated standard error of the ith simple residual. An explicit expres- 
sion for this standard error is given in Section 13.4.2. These residuals are sometimes known 



Checking Model Fit 



329 



as internally Studentized residuals. The standardized residuals are constructed to have 
a common variance equal to 1 (unit variance). Note that the standardized residuals appar- 
ently take the form of a f-sfafisfic (an esfimafed quanfify divided by ifs esfimafed SE, see 
Secfion 2.4) buf - because fhe numerator and denominator are nof independenf - fhey do 
nof follow a frue f-disfribufion. 

Bofh fhe simple and sfandardized residuals are esfimafed from fhe model fitted fo fhe 
full sef of dafa; fhis has fhe disadvanfage fhaf if an individual observafion sfrongly influ- 
ences fhe model fif, fhen ifs influence mighf nof be defected. An alfemafive mefhod based 
on prediction residuals overcomes fhis problem. We obfain fhe ifh prediction residual by 
fitting the proposed model with the fth observation excluded, and then predict that obser- 
vation from fhe new tiffed model. This approach highlighfs responses fhaf do nof follow 
fhe general pattern of fhe model when if is tiffed fo fhe resf of fhe observafions. Here we 
use fhe subscripf (f) fo denofe quanfifies calculafed wifh fhe ifh observafion omiffed, and 
y^i) denotes fhe predicfion for fhe ffh response from a model tiffed excluding fhaf observa- 
fion. The predicfion residual for fhe ffh observafion, denofed as £(,), is defined as fhe differ- 
ence between fhe predicfed value, y(,), and fhe response, i/„ i.e. 

h) = Vi - y& ■ 

We shall see an explicif formula for fhese residuals in Secfion 13.4.2. Since each observa- 
fion is nof involved in tiffing the model that provides its predicted value, these residuals 
provide a valid measure of fhe predicfive abilify of a model. Furfher discussion and fhe 
use of fhese residuals in cross-validation mefhods is presented in Secfion 13.5. However, as 
for simple residuals, fhe predicfion residuals do nof have a common variance. We therefore 
define fhe deletion residuals, r^,;, as a standardized version of fhe predicfion residuals. 



'■(0 = 




where the estimated standard error SE(e(,)) is also calculated from a regression analysis 
fhaf omifs fhe tth observafion. Delefion residuals are sometimes referred fo as externally 
Studentized residuals, and these residuals do follow a frue f-disfribufion. Hence, as a 
simple rule, observations with deletion residuals outside the range of +2 can be idenfified 
as pofenfial oufliers requiring furfher examinafion. 

We have explained predicfion and delefion residuals in terms of refiffing fhe model fo N 
subsefs of the data obtained by excluding each observation in turn. In practice, all the quan- 
tities required can be calculated from fhe resulfs of tiffing fhe model fo fhe full sef. In par- 
ticular, fhe delefion residuals can be direcfly computed from the standardized residuals as 



, - , If N-p-n 

where r, is the standardized residual, N is the sample size and p is the number of param- 
efers in fhe model (p = 2 for SLR models). Of her forms are given af fhe end of Secfion 
13.4.2. The delefion residuals follow a f-disfribufion wifh N - p - 1 df, i.e. N - 3 df for a 
SLR model. 
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In general, we recommend the use of deletion residuals as they more readily identify 
outlying observations that have had a strong influence on the fitted model. Further discus- 
sion is presented in Section 13.4. 



13.3 Using Graphical Tools to Check Assumptions 

In Section 5.2, a composite set of residual plots was used to investigate the validity of 
the assumptions underlying a model with a single explanatory factor. In that context, the 
issue of model misspecification did not arise, because the model fitted an effect for each 
treatment or group. In that case, the residuals provide an untainted estimate of the model 
deviations. For SLR, if the model is misspecified then the residuals comprise a mixture 
of two components: one corresponding to the unknown model deviations and the other 
corresponding to the discrepancy between the model and the form of the response. For 
this reason, the residual plots can be used to assess properties of the deviations only if the 
model gives a good representation of the observed trend. When the form of the model is 
adequate, then the composite set of residual plots described in Section 5.2 can be used for 
models with quantitative explanatory variables, including the more complex regression 
models presented in Chapters 14, 15 and 17. 

To recapitulate from Section 5.2, the residual plots can be used to check the assump- 
tions that the deviations have equal variances (homogeneity of variances) and that they 
are consistent with observations from a Normal distribution. We usually check homo- 
geneity of variance by plotting the standardized or deletion residuals, or their absolute 
values, against the fitted values from the model. In these graphs, variation is quantified as 
the vertical spread of the residuals: there should be no large change in this spread across 
the range of the fitted values (see Figure 5.2). If there is evidence of heterogeneity, then 
transformation of the response (Chapter 6) or a generalized linear model (Chapter 18) 
might be considered. The distribution of the residuals can be assessed with histograms 
and Normal probability plots (see Section 5.2.3). A histogram of residuals should show 
a symmetric, bell-shaped distribution, and the probability plot should yield an approxi- 
mately straight line, with greater conformance to the expected shape being required for 
larger sample sizes. 



EXAMPLE 13.2A: DIPLOID WHEAT 

Recall that in Example 12.1, we fitted a SLR model to 190 diploid wheat seed weights 
with seed length as the explanatory variate. A subset of the data was presented in Table 
12.1 and the complete data set can be found in file triticum.dat and Table A.l. 

As noted above, the fitted model and fitted value plots suggested some evidence 
of model misspecification. However, the curvature in the data was small compared 
with the strong linear trend and so - with caution - we use the composite set of 
plots based on deletion residuals (Figure 13.3) to assess assumptions about the model 
deviations. 

For this large data set, these graphs are very similar to those from the same model 
presented in Figure 12.4, which were plotted with standardized residuals. Variation 
of the residuals appears reasonably constant across the range of the fitted values, 
which accords with the assumption of homogeneity of variances. The histogram of the 
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Fitted value Fitted value 





Deletion residual 



Normal quantile 



FIGURE 13.3 

Composite set of residual plots, based on deletion residuals, for a SLR model with seed weight as the response 
and seed length as the explanatory variate (Example 13.2A). 



residuals is symmetric and approximately bell shaped and the Normal plot is approxi- 
mately a straight line, which together indicate the consistency of the residuals with a 
Normal distribution. The largest residual (corresponding to a fitted value of just under 
37 mm) seems a little inconsistent with the rest of the distribution, although it does not 
stand out in the Normal plot. Weight and length measurements of this seed should 
perhaps be checked, but unless an error is found, it should be retained in the analysis 
(outliers were discussed in Section 5.4). 



As stated in Section 13.1, in a SLR model, the fitted value plot may be substituted by one 
of the residuals against values of the explanatory variate, but this is not the case for the 
more complex regression models discussed in Chapter 14, which have several explanatory 
variates. In these models, the residuals can be plotted against each explanatory variate in 
turn to look for model misspecification. If the model fits well, then the residuals should be 
distributed homogeneously around zero without any systematic pattern. 

The residuals can also be plotted against an additional explanatory variate that might 
help to explain the response, leading to a multiple regression model (see Chapter 14). If this 
graph shows a linear trend, then adding the new explanatory variate to the model might 
improve the fit. If the trend is non-linear, then transformation of the explanatory variable 
or a non-linear model (Chapter 17) might be required. Unfortunately, this graph gives a 
biased impression of the contribution that the new variate would make to the model, but 
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an alternative graph, called an added variable plot, can be constructed to give an unbiased 
picture (see Section 14.6). 



13.4 Looking for Influential Observations 

When a SLR model is fitted, observations with more extreme values of fhe explanatory 
variafe can have a large impacf on fhe fiffed line, which may make fhe resulfing predic- 
fions unreliable. For example, if we drop one observafion from fhe model and fhere is a 
large change in fhe esfimafed slope paramefer, fhen we should be concerned abouf fhe 
robusfness of fhe model. The basic concepfs used fo invesfigafe fhis fype of problem are 
leverage and influence. Leverage is a measure fhaf idenfifies fhe more exfreme values of 
fhe explanafory variafe, which have fhe pofenfial fo be highly influenfial. However, lever- 
age cannof quanfify whefher fhe observafion has had a large impacf on fhe fiffed model, 
which is evaluated by ifs influence. The influence of an observafion is a measure of fhe 
change in fhe fiffed values fhaf would occur if fhaf observafion was omiffed. These con- 
cepfs are illusfrafed for a SLR model in Figure 13.4. 







FIGURE 13.4 

Leverage and influence of the highlighted point (o): (a) small leverage and influence; (b) small leverage and large 
influence; (c) large leverage and small influence and (d) large leverage and influence. Black line represents the 
'true' straight line relationship, and grey line represents the fitted model. 
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In each part of Figure 13.4, the black line represents the 'true' underlying straight 
line relationship while the grey line represents the fitted model. Each set of data has 
10 observations in common (solid circles) with one additional highlighted observation 
(open circle) taking a different value in each plot. Figure 13.4a shows the highlighted 
point (o) with a value that gives it a small leverage (in the centre of the range of the 
explanatory variate) and with a small influence on the fitted line, which is then close 
to the true line. In contrast, the highlighted point in Figure 13.4b, while also having a 
small leverage, has exerted influence over the fit by increasing the value of the intercept 
(i.e. inducing a bias). In each of Figures 13.4c and d, the highlighted point has a large 
leverage. The highlighted point in Figure 13.4c has little influence on the fit, as it is con- 
sistent with the pattern in the rest of the data. In Figure 13.4d, the highlighted point is 
inconsistent with the rest of the data and has a large influence on the fitted line, causing 
changes in the estimates of both the intercept and slope parameters. Note that a point 
with large influence often appears inconsistent with the rest of the data, but might not 
appear as an outlier in residual plots if the fitted line is drawn towards it, as in Figure 
13.4d. These examples demonstrate that although leverage, influence and outliers are 
often closely related, this is not always the case. 



13.4.1 Measuring Potential Influence: Leverage 

As described above, in a SLR, leverage quantifies the distance between the value of 
an explanatory variate for a given observation and the sample mean of that variate. If 
an observation is an outlier with respect to the explanatory variate (i.e. it has a par- 
ticularly small or large value in comparison with the rest of the observations), then 
it is called a leverage point. Observations with large leverage can affect the fit of the 
model, but only if they are inconsistent with the overall trend. Therefore, a point with 
large leverage is not necessarily an influential point (e.g. Figure 13.4c). For this reason, 
leverage is most useful for assessing potential problems prior to analysis. For example, 
if an experimenter has control over values of the explanatory variate, then leverage 
can be assessed for different allocations (of value and replication) for the explana- 
tory variate. However, the leverages give further insight into the form of the residuals 
discussed in the previous section, and into calculations of influence, and so we give 
further details here. 

One common measure of leverage is called the hat-value. In a SLR model, the hat-values, 
also known as leverages, give a measure of the distance of the zth value of the explanatory 
variate, from its sample mean, x, computed as 



1 I {Xj-xf 

N 



SS, 



where SS^^. is the sum of squares for the explanatory variate (as defined in Section 12.2) and 
N is the total number of observations. The name hat-value reflects the fact that these lever- 
ages are related to the fitted values ('y-hat'). In fact, the fth fitted value, y„ can be expressed 
as the sum of all N observed responses, identified as i/y, j = l ... N, multiplied by values 
defined as 
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The fitted value for the fth observation can then be expressed as 

N 

h = ■ 

y=i 



These weights capture the extent to which the ;th observation affects the zth fitted value. 
If hij is large, then the ;th observation has a substantial influence on the zth fitted value. 
The leverages, ti„, therefore correspond to the influence that an individual observation has 
over its own fitted value. Note that the form of the weights, h^j, becomes more complex for 
models containing several explanatory variates. 

The leverage /i„ can take values between 1/N and 1 (i.e. 1/N < ft,, < 1) and the sum of 
the leverages is always equal to the number of (independent) parameters in the model, p; 
their average is therefore p/N in general, and so 2/N for SLR models. Observations with 
large ft,, values are identified as having more leverage and, as a rule of thumb, values of 
ft,, > 2 X p/N are considered to be potential influential points. Leverages can be plotted 
against an explanatory variate or the fitted values. 



EXAMPLE 13.2B: DIPLOID WHEAT 

For the SLR model from Example 13.2A, the leverage threshold is 2 x p/N = 
2 X 2/190 = 0.021. Figure 13.5 shows the leverages plotted against the explanatory vari- 
ate, with the threshold of 0.021 shown. This plot shows the quadratic relationship 
between the leverages and the explanatory variate in the SLR model, and that units with 
large leverages correspond to more extreme values of the explanatory variate. Clearly, 
most of the values are in the middle of the range with small leverage, but a few of the 
more extreme observations have leverages greater than the threshold of 0.021, with the 
maximum being 0.042. These observations are potentially influential points and should 
be further investigated by the influence measures described below. 




FIGURE 13.5 

Leverages plotted against the explanatory variate (seed length, mm) for SLR with seed weight as the response 
(Example 13.2B). The horizontal line indicates leverage threshold (0.021). 
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13.4.2 The Relationship between Residuals and Leverages 

We can write explicit expressions for the standardized, prediction and deletion residuals 
defined earlier in ferms of fhe leverages, which clarify the relationships between these 
residuals. The estimated variance of fhe simple residual associated with the ith observa- 
tion, 6i, can be written in terms of its hat-value as 

Var(e,) = s'^{l -hu), 

and the estimated standard error is the square root of fhis variance. If follows fhaf uncer- 
fainfy in fhe residual decreases as fhe leverage increases. This is because observations with 
very large leverage tend to get fitted more closely than observations with small leverage. 

The standardized residual, r„ is calculated as the simple residual divided by its esti- 
mated standard error, i.e. 



SE(e;) 7s^(l - h,i) 



Since SE(e,) is smaller for observafions wifh large leverage, fhe sfandardized residuals of 
fhese observafions fend fo be slighfly inflafed relative to observations with smaller lever- 
ages. In practice, the range of leverages needs fo be very large for fhis effecf fo become 
noficeable. 

The prediction residuals, £(,), can also be written more simply in terms of fhe simple 
residuals and fhe leverages as 



- Hi ~ 3/(0 - 1 r 



So, we can obfain the prediction residual for fhe ith observafion by re-scaling its simple 
residual by one minus its leverage. The variance of fhe prediction residual takes the form 

Var(e(i)) = , 

1 - hi, 



where S(^) is equal fo the residual mean square (ResMS) obtained from a SLR wifh fhe ifh 
observation omitted, and this variance can be directly calculated from the SLR results as 



2 2 
S(i) = s 



(N-p-r,^'\ 

tN-p-lJ 



The deletion residual (Section 13.2) can then be expressed in terms of fhe ofher residuals as 
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13.4.3 Measuring the Actual Influence of Individual Observations 

While leverage indicates the potential of individual observations to have a large impact 
on the fitted model, influence statistics measure the actual impact of each observation 
on the fitted model, and are hence generally more useful. Influence statistics help us to 
detect these individual observations that truly affect the fitted model, known as influen- 
tial points. An influential point may affect one or more aspects of the fitted model. For 
example, in Figure 13.4b, the highlighted point affects the estimated intercept but not the 
slope; in Figure 13.4d, the highlighted point affects the estimates of both the intercept and 
slope. In both cases, the fitted model is changed, and in this section, we present some com- 
mon influence statistics that measure the impact of individual observations on the overall 
fit of the model via changes in the fitted values. 

Cook's statistic, D„ measures the influence of an individual observation in terms of the 
change in the fitted values that would occur if that observation was omitted. For the ith 
observation, this statistic can be computed as 



\P ) 

where r, is the standardized residual, is the leverage for the ith observation, and p is 
the number of (independent) parameters in the model. Larger values of D, correspond to 
observations with more influence on the fitted values. As a rule of thumb, values of D, > 1 
indicate influential points. 

A modified form of Cook's statistic can be more useful for diagnostic plots, because 
its values can be used in half-Normal plots, where deviations in the tail of the distribu- 
tion indicate the presence of potentially influential points. This modified statistic, C„ is 
defined as 
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which uses the deletion residual rather than the standardized residual (Atkinson, 
1984). A plot of the modified Cook's statistic against the explanatory variate(s) or fitted 
values can identify the location of influential points that exceed the threshold value of 
C. > 2^{N-p)/N. 

Figure 13.6 shows the modified Cook's statistics plotted against the fitted values and as a 
half-Normal plot (see Section 5.2.3) for the data of Figure 13.4c (large leverage, small influ- 
ence) and Figure 13.4d (large leverage, large influence). When the highlighted observation 
has a small influence, all the C, values fall below the threshold of 1.81 = 2^{11 - 2)/ll 
(Figure 13.6a) and the half-Normal plot shows an approximately straight line (Figure 
13.6b). When the highlighted observation has a large influence, it has a very large value 
of C, = 6.87, substantially exceeding the threshold value (Figure 13.6c), and there is clear 
deviation from a straight line pattern in the half-Normal plot (Figure 13.6d). 

Influential points might be outliers, but they do not necessarily appear as outliers in 
residual plots because the model fit has been adapted to accommodate them. In extreme 
cases, observations with both large leverage and large influence may be fitted almost 
exactly, giving a very small residual and possibly causing distortion elsewhere in the 
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FIGURE 13.6 

Modified Cook's statistics plotted against fitted values from the SLR in (a) Figure 13.4c and (c) Figure 13.4d; half- 
Normal plots of modified Cook's statistics from the SLR in (b) Figure 13.4c and (d) Figure 13.4d. 



model. The investigation, and treatment, of influential points should be similar to that for 
ofher pofential outliers, as discussed in Section 5.4. An important difference, in fhe confexf 
of regression models, is fhaf we know fhaf if we omif fhe influenfial observafions, then the 
fitted model will change. Such influential observations should be checked for errors - in 
eifher fhe response or fhe explanatory variate or bofh - and corrected if necessary. If fhere 
is no evidence of any misfake, fhen fhe presence of fhe influenfial observafions mighf indi- 
cafe fhaf fhe model is inadequate to explain fhe relationship: another explanatory variate 
might be required, or the shape of fhe relafionship mighf be wrong. The influenfial obser- 
vafions should nof be removed from fhe dafaset wifhout good reason, and any such acfion 
should be documenfed and reported. If may be helpful to consider fhe fif wifh and wifhouf 
any highly influenfial observations (possibly omitting potential outliers one at a time). 



EXAMPLE 13.2C: DIPLOID WHEAT 

The modified Cook's statistics for the SLR model of Example 13.2A are plotted against 
the fitted values and as a half-Normal plot in Figure 13.7. 

There are N = 190 observations, and the SLR model has p = 2 parameters, giving a 
threshold value of 2y^(190 - 2)/190 = 1.99. Using this threshold, we might identify six 
to eight influential points, and the plot of the modified Cook's statistics against the fitted 
values shows that most of the influential points are the seeds with the smallest and larg- 
est predicted lengths. The half-Normal plot appears curved rather than straight, and the 
eight largest values are not convincingly part of this trend. We know from our previous 
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Fitted value Normal quantile 



FIGURE 13.7 

Modified Cook's statistics from SLR for seed weight with explanatory variate seed length: (a) plotted against 
fitted values and (b) a half-Normal plot (Example 13.2C). 



analysis (Figure 13.3) that there is some doubt about the fit of the model in these more 
extreme regions of the explanatory variate (seed length); the influence statistics now 
suggest that these regions may also affect the overall fit of the model. However, if we 
omit the eight points whose modified Cook's statistic exceeds the threshold, then the 
change in the fitted line is small: the intercept decreases by -0.15 and the slope increases 
by 0.006 units; both changes are small compared with the standard errors of the esti- 
mated parameters (see Table 12.4). We can conclude that collectively, these influential 
points have only a small impact on the overall fit of the model. 



13.5 Assessing the Predictive Ability of a Model: Cross-Validation 

Cross-validation methods are used to assess the predictive ability of a model, and they 
can also provide an effective basis for choosing between competing models (see Section 
14.9.3). Critical evaluation of a model is vital, as it enables limitations to be detected. This 
knowledge is especially important when the quality of real-life decisions depends on the 
reliability of predictions. For example, if a model is developed to predict contamination of 
grain via a sampling procedure, then the predictions must be accurate: if contamination is 
over-estimated, then good grain will be wasted; if it is under-estimated, then food quality 
might be compromised. To provide a realistic picture of its performance, a model should 
ideally be evaluated with an independent set of data, i.e. not the data on which the model 
was developed and fitted. The fitted model will tend to adapt to quirks in the original data 
that may not be representative of a wider population; so, the independent data should be 
representative of the population to which the model is going to be applied. The original 
data to which the model was fitted and the independent data used to test the model are 
commonly called the training and validation sets, respectively. 

The cross-validation process consists of two steps. In the first step, a model is fitted to 
the training set. In the second step, this fitted model predicts the response for observa- 
tions in the validation set. The differences between the observed and predicted values 
in the validation set, which we call discrepancies, may give some indication of ranges of 
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the explanatory variable(s) for which predictions are reliable, and those for which fhey 
are unreliable. Nofe fhaf fhe discrepancies are nof fhe same as model residuals: residuals 
arise from fhe sef of observafions used fo fif fhe model, whereas discrepancies arise from 
an independenf sef of observafions nof used in filling fhe model. The overall predicfive 
abilify of fhe model may be quanfified by summaries of fhese discrepancies in ferms of 
sfafisfics measuring bias or precision, which are described below. If fhe model is consid- 
ered accepfable affer fhe cross-validafion process is complefe, fhen a final model is offen 
filled from fhe combined fraining and validafion sefs. 

In pracfice, if is offen difficull or impracfical fo obfain furfher independenf dafa and 
so, for fhe purposes of proper model validafion, fhe original dafa may be splif info fwo 
disfincf subsefs fhaf form fhe fraining and validafion sefs. This cross-validafion mefhod is 
known as data splitting. At least half of fhe dafa will usually be allocafed, af random, fo 
fhe fraining sef. The parfifioning process has several drawbacks - bofh fhe qualify of fhe 
filled model and fhe resulfs of fhe cross-validafion may depend on fhe parfifion selecfed, 
parficularly when fhe number of observafions in eifher subsef is small. The opfimal parfi- 
fion depends on fhe confexf: a larger fraining sef may produce a more robusf model, buf 
does nof leave sufficienf observafions for reliable validafion. 

When fhere are foo few dafa fo be divided info fwo subsefs, fhe leave-l-out cross- 
validafion mefhod can be used. Here, fhe fraining sef (of size N - 1) confains all buf one of 
fhe observafions, which becomes fhe validafion sef (of size 1). This procedure is repealed 
for each observafion in furn. Hence, fhe model is filled N limes: firsf, fhe model is filled 
fo fhe dafa wifh fhe firsf observafion omitted and fhe response for fhe firsf observafion is 
predicfed; fhen fhe same procedure is followed for fhe second observafion, and so on. In 
Ibis case, fhe discrepancies are equal fo fhe predicfion residuals defined in Secfion 13.2. 
Anofher varianf of fhis procedure, known as fhe leave-k-out cross-validafion, splifs fhe 
dafa info subsefs of size k and uses fhese as validafion sefs wifh fraining sefs of size N -k 
(fhe remaining observafions). Again, fhe model is fitted for each fraining sef and predicfs 
fhe response in each validafion sef. A fhird varianf, known as k-fold cross-validafion, 
splifs fhe dafa info k subsefs of approximafely equal size and uses each subsef in furn as 
fhe validafion sef. 

In all fhese cross-validafion mefhods, fhe predicfive abilify of fhe model is assessed on 
fhe discrepancies befween fhe observed values and predicfions made for fhe validafion 
sef. If fhese discrepancies are small, fhen fhe model is deemed fo have good predicfive 
abilify. To define summary sfafisfics fhaf quanfify fhe discrepancies, we need fo idenfify 
fhe validafion sef separafely from fhe fraining sef. We illusfrafe fhe approach for fhe case 
of a SLR model, buf fhe procedure is similar for more complex models. We denofe fhe 
number of observafions in fhe validafion sef as M and represenf fhese observafions as Yj 
. . . Yj^, wifh ... as fhe associafed values of fhe explanatory variate. From fitting a SLR 
model fo fhe fraining sef, we obfain paramefer esfimafes d and [3 . These can be used fo 
form predicfions for fhe validation set as 

Y, = d + pX, 



for all observafions, i = l ... M. The discrepancies, T) - T), measure fhe predicfive abilify of 
fhe model on fhe validafion sef. Ploffing fhe discrepancies againsf fhe explanatory variate, 
X„ mighf indicate specific ranges wifhin which fhe model gives poor predicfions. 

Several sfafisfics have been devised fo summarize overall predicfive abilify. Here, we 
measure bias wifh fhe predicfion bias sfafisfic and we measure precision wifh fhe mean 
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absolute difference and fhe square roof of fhe mean square error. The prediction bias (PB) 
of fhe fiffed model is esfimafed as fhe mean of fhe discrepancies across all observafions in 
fhe validafion sef, wriffen as 



PB = 



J_ 

M 



iVl 



The PB fakes a posifive value if fhe predicfions persisfenfly under-esfimafe fhe observed 
response, and a negafive value if fhey persisfenfly over-esfimafe fhe observed response. 
If fhe predicfions are unbiased, fhen fhe PB should be close fo zero, buf nofe fhaf fhis can 
also occur if posifive bias in one area is cancelled ouf by negafive bias in anofher; so, fhe 
numerical summary should be inferprefed alongside a graph of fhe discrepancies againsf 
fhe fiffed values. This cancelling ouf cannof happen wifh fhe mean absolute difference 
(MAD), which is calculafed as fhe mean of fhe absolufe discrepancy values, i.e. 



MAD 




i'=i 



The MAD is close fo zero only if mosf of fhe discrepancies are small. A relafed measure, 
fhe mean square error of prediction (MSEP) is the mean of fhe squared discrepancies, 
wriffen as 



1 “ 

i=l 

This quanfify is analogous fo fhe ResMS from fiffing fhe SLR model; fhe difference here 
is fhaf esfimafion and predicfion use differenf sefs of dafa. If fhe predicfive abilify of fhe 
model is good, fhen fhe MSEP will be similar in size fo fhe ResMS, on fhe assumpfion fhaf 
fhe background variafion is similar in fhe fraining and validafion sefs. The square roof of 
fhe MSEP, known as fhe roof mean square error (RMSE), 

RMSE = VMSEP , 



is a common alfernafive fo fhe MSEP. Bofh fhe MAD and RMSE fake posifive values, wifh 
large values indicafing a model wifh poor predicfive abilify. Bofh sfafisfics are easily infer- 
prefed as fhey are on fhe same scale as fhe observafions; fhe major difference befween 
fhem is fhaf fhe RMSE gives more weighf fo large discrepancies. These sfafisfics are espe- 
cially useful for comparisons of differenf models. If can also help fo express fhe PB, MAD 
and RMSE as a percenfage relafive fo fhe average response, Y, from fhe validafion dafa 
sef. Eor example, PB% = 100 x PB/Y represenfs fhe predicfion bias as a percenfage of fhe 
average response. 

If you are concerned fhaf fhe predicfive abilify of fhe model mighf change for specified 
subgroups, for example for large, medium or small values of fhe explanafory variafe, fhen 
if may be appropriafe fo parfifion fhe validafion sef according fo fhis criferion and fo cal- 
culafe and compare fhese summary sfafisfics for each subgroup separafely. 
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EXAMPLE 13.2D: DIPLOID WHEAT 

In Examples 13.2A to 13. 2C, we found some evidence of model misspecification in 
the SLR model that describes seed weight as a linear function of length. We can 
now examine the predictive ability of this model directly with cross-validation. To 
do this, we have split the data into two equally sized subsets; 95 observations were 
selected at random and allocated to the training set and the remaining 95 observa- 
tions were allocated to the validation set. The SLR model was fitted to the training 
set, and accounted for 81.0% of the variance, with the parameter estimates shown in 
Table 13.3. 

The parameter estimates accord with those from the full model (Table 12.4), although 
with a somewhat smaller intercept, steeper slope and larger standard errors, reflecting 
the reduced size of the training set. 

The fitted model is shown in Figure 13.8a with the validation set, and Figure 13.8b shows 
the discrepancies between the observations in the validation set and their predicted val- 
ues, plotted against the associated values of seed length, the explanatory variate. 

The fitted model appears to run through the cloud of observations, but the discrepan- 
cies appear to show a general trend of decreasing value as seed length increases, with the 
exception of one very long seed. This suggests that the fitted slope is not quite following 
the trend in the validation set, although the general pattern isjreasonable. The PB takes 
value -1.07, or -3.9% as a percentage of the mean seed weight (Y = 27.67 in the validation 
set), indicating slight over-estimation of the response on average. The MAD and RMSE 
are 2.62 and 3.22, respectively, or 9.5% and 11.6% of the mean response. For comparison, 
the estimated background standard deviation from the training set was s = 2.76 and so, 
the average discrepancy in the validation sets is close to background variation in the 
training set. These results suggest no great problems with the SLR model. If we decided 



TABLE 13.3 



Parameter Estimates with Standard Errors (SEs), t-Statistics (t) and Observed Significance Levels 
(P) for a SLR Model for Seed Weight in Terms of Seed Length, Based on a Randomly Selected 
Training Set of 95 Observations (Example 13.2D) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


-31.299 


3.0537 


-10.250 


< 0.001 


Length 


p 


18.358 


0.9158 


20.045 


< 0.001 




Length 



Length 



FIGURE 13.8 

Cross-validation of SLR model for seed weight with seed length as the explanatory variate, (a) Validation set (95 
observations) plotted with the SLR model fitted to the training set (remaining 95 observations) and (b) discrep- 
ancies plotted against the explanatory variate (Example 13.2D). 
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to fit a more complex model (e.g. Chapter 14), this type of cross-validation could be used 
to compare and evaluate different models. 

In the case of leave-l-out cross-validation, the MSEP is known as the predicted residual 
sum of squares (PRESS), and a small value of the PRESS statistic indicates good predictive 
ability. Leave-l-out cross-validation is also closely related to the technique of jackknifing, 
which is used to obtain estimates of bias and precision for individual parameters or other 
model statistics. This technique is outside the scope of our account here, but you can find 
details in Efron and Tibshirani (1993). 

EXERCISES 

13.1 The microarray study introduced in Exercise 12.6 (data in file senescence.dat) 
investigated gene expression associated with the senescence of leaves. Use SLR 
to predict the expression of gene CATMA2A31 585 over time, and use diagnostic 
plots and a formal test of lack of fit to assess the quality of this model. (We re-visit 
these data in Exercise 17.2.) 

13.2* Now, consider the original data from the experiment described in Exercise 12.1. 
The numbers of leaves on each plant (variate NLeaves) are in file cabbage.dat 
with unit numbers (ID) and sample dates (variate Days). Eit a SLR and use diag- 
nostic plots to check the fit of the model. Would a transformation be appropriate 
here? If so, implement it and re-fit the SLR on your chosen scale. Plot the fitted 
model and check for any evidence of lack of fit. Give a 95% Cl for the growth rate 
over the period (as leaves per day) and interpret this estimate. Can you recon- 
cile this result with the one you gave in Exercise 12.1? (We re-visit these data in 
Exercise 18.2.) 

13.3 Chickweed plants were sampled from a field trial to investigate whether the 
number of seeds produced could be related to the plant biomass, measured 
as dry weight (g). Eile chickweed.dat holds unit numbers (ID), the number of 
seeds (variate A/Seed) and dry weights (variate DryWt) for 36 plants. Investigate 
the relationship between the variables, and use diagnostic plots to help decide 
whether you can use SLR to give a good description of this relationship. (We re- 
visit these data in Exercises 17.3 and 18.7.)* 

13.4 In Exercise 12.2, you fitted a SLR to the log-transformed wet weight of flying 
insects collected over 30 years. Re-analyse these data without transformation, 
and use diagnostic plots to assess whether the model assumptions are better met 
on the untransformed or log scale. Check whether there is any sign of correla- 
tion in the errors between successive measurements on your chosen scale. (We 
re-visit these data in Exercise 15.1.) 

13.5 In Exercise 12.4, you used SLR to establish whether the maximum stem length 
(variate MaxLength in file willowstems.dat) could be used as a predictor of dry 
matter (variate DryMatter). 

a. Use diagnostic plots to critically examine the fit of this SLR. Eit SLRs in terms 
of the other possible surrogate variables (variate SumLength is the sum of 
lengths of all stems, variate SumDiam is the sum of diameters of all stems 
and variate LengthTop5 is the average length of the five longest stems) and 
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investigate the quality of the fit in each case. Which variable is the best pre- 
dictor of dry maffer? 

b. As fhere are 113 samples, fhere are sufficienf dafa fo compare fhese SLR mod- 
els by cross-validafion. Selecf 57 of fhe samples and re-fif fhe SLR for each 
explanatory variable from fhis subsef, fhen assess fheir fif using fhe remain- 
ing 56 samples: calculafe fhe predicfion bias (PB), mean absolute difference 
(MAD) and roof mean square error (RMSE). Which model has fhe besf pre- 
dicfive properfies? How did you selecf your samples, and was fhis mefhod 
safisfacfory? 

(We re-visif fhese dafa in Exercise 14.5.) 

13.6* Yield and a measure of disease were gafhered from a field frial fo fry fo esfablish 
a yield loss relafionship. The unif numbers (ID), disease index (variate Index) and 
yield (variate Yield) are in file yieldloss.dat. Eif a SLR and evaluate fhe qualify 
of fhis model. Whaf happens if you exclude any highly influenfial observafions 
from fhe model? Whaf can you conclude abouf fhe reliabilify of fhe SLR? 

13.7 The EXAMINE projecf (see Example 14.2) idenfified various measures of cold- 
ness during fhe winter as good predictors of fhe dafe of fhe firsf capfure of vari- 
ous aphid species in sucfion fraps. Here, we invesfigafe fhe predicfive abilify of 
variable C60Day (fhe average femperafure during fhe coldesf 60-day period) fo 
predicf fhe dafe of fhe firsf capfure of fhe aphid Myzus persicae (variable Mpelsf) af 
Long Ashfon (in soufh-wesf England) over fhe periods 1970-1988 and 1993-2000. 
The dafa (variafes ID, Year, CGODay, Mpelst) are held in file longashton. dat. Eif 
a SLR and use diagnosfic plofs fo examine fhe fif. Is fhere any evidence of tem- 
poral correlafion? Idenfify any influenfial observafions and examine fhe impacf 
of excluding fhem from fhe fiffed model. Whaf conclusions can you draw abouf 
fhe reliabilify of fhe SLR?* 
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Models for Several Variates: Multiple 
Linear Regression 



In Chapter 12, we introduced simple linear regression, which models a response variable 
as a straight line function of a single quanfifafive explanafory variable, or variafe. In fhis 
chapfer, we exfend fhe concepf fo allow several variafes in a multiple linear regression 
(MLR) model. This extension is analogous to the multi-factor model (Section 8.1) which 
investigates the simultaneous effects of several differenf qualifafive variables, or factors, 
on fhe response. These exfensions allow more realisfic models, as usually several differ- 
enf explanafory variables mighf be associafed wifh changes in fhe response, parficularly 
in observafional sfudies in which fhere is liffle or no confrol over fhe experimenfal con- 
difions. In fhese circumsfances, fhere can be sfrong correlafion, or collinearify, befween 
explanafory variafes fhaf can complicate fhe choice of which variafes fo include in a model. 
This chapfer ouflines fhe basic properfies of MLR models and infroduces mefhods for 
selecfion of explanafory variables. 

As wifh any modelling exercise, fhe firsf sfep in building a MLR model is fo explore fhe 
dafa, in fhis case fo invesfigafe fhe infer-relafionships among fhe explanafory variafes as 
well as fhose befween fhe response and fhe individual explanafory variafes (Secfion 14.1). 
The general form of fhe MLR model (Secfion 14.2) is an extension of fhe SLR model and, 
as in fhaf model, paramefer esfimafion is achieved by fhe mefhod of leasf squares (Secfion 
14.3). For a given sef of explanafory variafes, analysis of variance (ANOVA) is again used fo 
esfimafe background variafion, fo assess whefher fhe variabilify associafed wifh fhe model 
is large compared wifh background variafion, and fo assess fhe confribufion of individual 
explanafory variafes fo fhe model (Secfion 14.4). The esfimafe of background variafion is 
used fo make inferences on fhe model parameters, including predicfions (Secfion 14.5). 
Predicfion from fhe fiffed model is offen one of fhe main aims of a MLR, buf accurafe 
predicfion requires a well-fiffing model. We can invesfigafe model misspecificafion by 
visualizafion of fhe confribufion of individual explanafory variafes fo fhe model (Secfion 
14.6). The choice of explanafory variafes fo include in a model can be complicafed by fhe 
presence of correlafion, or collinearify, befween fhem and cases of very sfrong collinear- 
ify should be defecfed and avoided (Secfion 14.7). Addifional goodness-of-fif sfafisfics are 
available for MLR models (Secfion 14.8) and fhese can be used fo compare models wifh 
differenf sefs of explanafory variafes. These sfafisfics are ufilized in various sfrafegies for 
model selecfion (Secfion 14.9). 



14.1 Visualizing Relationships between Variates 

A MLR model aims to describe the relationship between a single response variable and 
two or more variates. Because correlation within the set of explanatory variates can affect 
the stability of a MLR model, we first inspect the data to detect which explanatory variates 
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are clearly associated with the response, to see if these relationships are approximately lin- 
ear, and to detect any strong correlations between pairs of pofenfial explanafory variafes. 

As wifh SLR models, if is assumed fhaf fhere is a sfraighf line relafionship befween fhe 
response variafe and each pofenfial explanafory variafe; however, in a MLR model, fhis 
relafionship mighf hold only affer one or more ofher explanafory variafes has been faken 
info accounf, and so may nof be immediafely apparenf. The presence of sfrong correlafions 
befween pairs of explanafory variafes is an indicafion of collinearity, where fhe fwo vari- 
afes are essenfially measuring fhe same characferisfic of fhe response, and which can cre- 
afe problems in inferprefafion and (in exfreme cases) problems in filling fhe model. These 
issues are discussed furfher in Secfion 14.7. Here, we invesfigafe patterns of pairwise cor- 
relafion befween variafes by consfrucfing a correlafion mafrix (see Secfion 2.5) and by 
visualizing fhe underlying relafionships in more defail wifh a scatter plof mafrix. 

Calculafion of fhe correlafion mafrix for a response variafe and sef of pofenfial explana- 
fory variafes summarizes all pairwise correlafions wifhin fhe sef (see Secfion 2.5). A scatter 
plot matrix displays the pairwise scatter plots for fhe sef of response and pofenfial explan- 
afory variafes in fhe form of a mafrix so fhaf each row of plofs has fhe same variafe plotted 
on fhe y-axis and each column has fhe same variafe plotted on fhe x-axis (see Figure 14.1). 




FIGURE 14.1 

Scatter plot matrix for the diploid wheat seed data (Example 14.1). 
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This visual inspection of patterns of association also allows detection of curved relation- 
ships and of unusual, or outlying, observations. 

EXAMPLE 14.1A: DIPLOID WHEAT 

In Example 12.1, we described an experiment in which several morphological traits 
were measured on 190 seeds from a line of diploid wheat, Triticiim monococcum. The 
traits measured on each seed were diameter (mm), length (mm), weight (mg), mois- 
ture content (%) and hardness index. The aim of the analysis was the identification 
of variables associated with differences in seed weight. The data can be found in file 
TRiTicuM.DAT and Table A.l. We previously fitted a SLR model to describe seed weight 
as a linear function of seed length, and this model accounted for 77.8% of the varia- 
tion in seed weight (adjusted M = 0.778). Now, we want to know if we can improve this 
model by adding information from other explanatory variates. 

A scatter plot matrix for the four explanatory variates (Length, Diameter, Moisture, 
Hardness) and the response variate (Weight) is shown in Figure 14.1. 

Plots of seed weight against the explanatory variates are shown in the last row of 
the matrix. It is clear that weight is linearly associated with both length and diameter, 
but there is no obvious association of weight with either of moisture content or hard- 
ness. However, it is still possible that there is a relationship between weight and either 
hardness or moisture content after adjustment for length (or diameter), as this type of 
indirect relationship would not necessarily be visible here. There is a very strong asso- 
ciation between length and diameter, indicating that these variables contain essentially 
the same information (are almost collinear). There are no other associations apparent 
within the set of explanatory variables. The correlation matrix (Table 14.1) corroborates 
these observations, and quantifies the strong correlation between diameter and length 
(r = 0.999) and between both of these variates and seed weight (r = 0.883 and 0.887, 
respectively). 



14.2 Defining the Model 

Having gained some insight into the structure of fhe dafa, we can sfarf fo build models. 
For fhe momenf, we ignore fhe topic of model (or variable) selecfion, which is discussed in 
Secfion 14.9, and assume fhaf we know which explanatory variafes we wish fo include in 
a MLR model. 

The simplesf MLR model is an obvious exfension of fhe SLR model fo relafe a response 
variafe fo fwo explanatory variafes. Where a SLR model fifs a sfraighf line in a fwo-dimen- 



TABLE 14.1 



Sample Correlations among Response (Weight) and Explanatory Variates, with Observed 
Significance Level in Parentheses, for the Diploid Wheat Study (Example 14.1A) 



Length 

Diameter 

Moisture 

Hardness 


0.999 (< 0 . 001 ) 
- 0.023 ( 0 . 748 ) 
- 0.124 ( 0 . 088 ) 


- 0.021 ( 0 . 773 ) 
- 0.125 ( 0 . 087 ) 


- 0.112 ( 0 . 125 ) 




Weight 


0.883 (< 0 . 001 ) 


0.887 (< 0 . 001 ) 


- 0.063 ( 0 . 390 ) 


- 0.207 ( 0 . 004 ) — 




Length 


Diameter 


Moisture 


Hardness Weight 
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FIGURE 14.2 

Plane (y = a + pjXi + for ^ MLR with two explanatory variates and Xj. • observed values above the plane, 

• observed values below the plane, o fitted values on the plane. Solid vertical lines represent the deviations, 
of the observations from the plane. O = origin (0, 0, 0). 



sional space, a MLR model with two explanatory variates fits a plane in a three-dimen- 
sional space, as shown in Figure 14.2. 

This model is represented mathematically as 

y, = a -I- PiXi, -I- P 2 X 2 ; -t e, , (14.1) 

where, as for fhe SLR model, fhe index i is used fo idenfify fhe individual observafions 
wifh labels from 1 fo N (fhe number of observafions). The value y, is fhe ffh observafion of 
fhe response variafe (symbolically denofed as variafe y), Xj, and X 2 , are fhe associafed val- 
ues of fhe firsf and second explanafory variafes (denofed x., and Xg), and e, is fhe deviafion 
from fhe plane for fhe ifh observafion (i = 1 ... N). The infercepf paramefer, a, is fhe fiffed 
response when bofh explanafory variafes are zero, and paramefers Pj and P 2 are fhe slopes 
(gradienfs) of fhe response as fhe firsf and second explanafory variafes vary, respectively. 
The sfandard assumptions presenfed in Secfion 12.1 all also apply fo fhe MLR model. 

We need also fo exfend fhe symbolic form of fhe explanafory componenf of fhe model. 
This componenf now includes fhe infercepf and bofh explanafory variafes as addifive 
ferms. So, fhe explanafory componenf of fhe model from Equafion 14.1 is written symboli- 
cally as 

Explanafory componenf: [1] + x^+X 2 

where, as in Secfion 12.1, [1] denofes fhe variafe faking value 1 everywhere, which is associ- 
afed wifh fhe infercepf paramefer a, and fhe explanafory variafes x., and Xj are associafed 
wifh fhe slope paramefers Pi and P 2 , respecfively. 

This model represenfs a plane in fhree-dimensional space, as shown in Eigure 14.2, 
defined in ferms of fhe paramefers a, Pj and P 2 . Paramefer a is fhe infercepf of fhe fiffed 
plane wifh fhe y-axis af fhe values Xi = Xg = 0, and is also fhe predicfed response af fhaf 
poinf. Paramefer Pj is fhe change in value of fhe fiffed plane for one unif increase in fhe 
firsf explanafory variafe, Xi, wifh fhe value of fhe second explanafory variafe, Xg, held con- 
sfanf. Similarly, paramefer Pg is fhe change in value of fhe fiffed plane for one unif increase 
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in the second explanatory variable, Xg, with the value of the first explanatory variable, Xy 
held constant. This model is based on the assumption that the two explanatory variates 
affect the response variate independently, so that for a fixed value of fhe second explana- 
tory variafe, fhe relafionship of fhe response variate wifh fhe firsf explanatory variate is a 
sfraighf line: fhe infercepf of fhis sfraighf line varies according fo fhe value of fhe second 
explanafory variafe, buf fhe slope remains consfanf. This can be seen mafhemafically by 
re-grouping fhe terms in Equafion 14.1 as 

y, = (a -I- P2X2,) + PiXi, -I- e, . 

Wifh fhe value of fhe second explanafory variafe held fixed, fhis is equivalenf fo a SLR 
model in terms of the first explanatory variate. Of course, a similar inferprefafion can be 
made for fhe second explanafory variafe if fhe firsf is held fixed. 



EXAMPLE 14.1B: DIPLOID WHEAT 

From the analysis in Example 12.1, we already know that seed weight is strongly related 
to length. From biological arguments, we suspect that for a given length, the weight 
might also be affected by hardness of the seed, and so we add this second explanatory 
variate into the model. In mathematical form, the model can be written as 

Weighty = a + Pi Lengthy + Pj HardnesSj + e, , 

where Weighty, Lengthy and HardnesSi are the weight, length and hardness index of the 
ith seed, respectively, and 6; is the deviation for that seed. As described above, a is the 
predicted seed weight for a seed of zero length and zero hardness. This is a substantial 
extrapolation beyond the range of the observed data and will probably not have biologi- 
cal meaning (see discussion at the end of Section 12.5). Parameter Pj is the increase in 
seed weight for one unit increase in length (with hardness held fixed) and parameter P 2 
is the increase in seed weight for one unit increase in hardness (with length held fixed). 

In symbolic form, this model is written as 

Response variable: Weight 

Explanatory component: [1] + Length + Hardness 

where the variates Weight, Length and Hardness contain the values of seed weight, 
length and hardness index, respectively. 

The MLR model can be extended from fwo fo any number of explanafory variates, 
alfhough simpler models - if plausible - are generally regarded as more desirable (see 
Secfion 14.9). The mafhemafical form of a general MLR model wifh q explanafory variafes is 

y,- = a + PiXi, + . . . + P,x,, -h . . . + P,x,, + e,. , (14.2) 

where y, is fhe ifh observafion of fhe response, X;, is fhe associafed value of fhe Ith explana- 
fory variafe (denoted X; in symbolic form, 1 = 1 ... q), and e, is fhe deviation for fhe ifh 
observafion. Again, fhe sfandard assumpfions presenfed in Secfion 12.1 all apply fo fhis 
model. We use p fo denofe fhe fofal number of paramefers in a MLR model. Here, fhere 
are p = q+l paramefers (Pj fo P^ and a) fo be estimated from fhe dafa. The tiffed surface is 
now a h5q)er-plane in multiple dimensions and hard fo envisage, buf fhe paramefers can 
sfill be inferprefed as previously. Paramefer a is fhe predicfed response wifh X; = 0 for all 
q explanafory variafes. Paramefer P; is fhe change in value of fhe tiffed plane for one unif 
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increase in the Ith explanatory variable with the values of all the other explanatory variates 
held constant. 



14.3 Estimating the Model Parameters 

Many of fhe principles infroduced in fhe confexf of a SLR model apply direcfly also fo a 
MLR model. Paramefer esfimafes are obfained by fhe principle of leasf squares finding fhe 
paramefer values fhaf minimize fhe residual sum of squares, ResSS. For fhe general MLR 
model of Equafion 14.2, fhe fiffed values are written as 

y, = d + PlXi,- + . . . + P/ X;; + . . . + X,; , (14-3) 

where esfimafed values are again indicafed by fhe haf (") embellishmenf. The simple 
residuals are again fhe differences befween fhe observed and fiffed values 

Ci = y; - y,. 

The ResSS is fhe sum of fhe squares of fhese simple residuals and so can be written as 

N N 

ResSS = ^(y,- - y,f =^[y, - (d+ PiXj; + ... + P,x,, + ... + P,x,,)]^ • 

1=1 i=l 

The esfimafed leasf squares paramefers are fhose fhaf minimize fhis quanfify. We do nof 
presenf fhe derivafion of fhese paramefer esfimafes, buf defails can be found in Rawlings 
ef al. (1998). We also do nof presenf fhe general form of fhe paramefer esfimafes in fhe 
MLR model, as fhe expressions are offen complex and difficulf fo inferpref, and in prac- 
fice we obfain fhese esfimafes from sfafisfical soffware. However, for a MLR model wifh 
jusf fwo explanatory variates, fhe expressions are relafively easy fo obfain, and give some 
insighf info fhe adjusfmenfs made when more fhan one variafe is presenf. As for fhe SLR 
model, esfimafes are based on fhe sums of squares and cross-producfs for fhe response 
and explanatory variates, infroduced in Secfion 12.2. The sums of squares for fhe response 
and fwo explanafory variates are wriffen as 

N N N 

SSyy ^^(y,' y) , (Xl, X^) , SSj; 2 X 2 ^ 2 ) / 

i=l 1=1 i=l 

where y is fhe mean of fhe response variafe, and Xj and X 2 are fhe means of fhe fwo explan- 
afory variates, respecfively. The sums of cross-producfs befween fhe variates are wriffen as 

N N N 

SS:,jy = ^(Xii - Xi)(y, - y) ; SS;, 2 y = ^(^ 2 , - ^2)(y; - y); = ^(^ 1 . - ^i)(^2, - ^ 2 ) • 

z=l i=l i=l 
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Recall that the sums of squares and the sums of cross-producfs, as presenfed in Secfion 
12.2, are simply scaled versions of unbiased sample variances and covariances, respec- 
fively. The fwo slope paramefer esfimafes are fhen calculafed as 



Pi = 



(SSj:,:C 2 X SS;tiy) ~ X 

(SS,jj;Cl X SS;C2I2) - (SS;ci2:2)^ 



P2 = 



(SS;qXl X SS;c2y) - (SS;ciX2 X SS;ciy ) 

(SS;,j;tl X SS;C2X2) “ (SS.jj;t2)^ 



and fhe esfimafed infercepf paramefer can be wriffen in ferms of fhe esfimafed slopes and 
fhe response and explanafory variafe means as 

d = y - Pi Xi - p2 ^2 • 

The esfimafed slopes for fhe explanafory variafes (Pi and P 2 ) usually differ from fhose 
obfained from fwo separafe SLR models because of correlafion befween fhe fwo variafes. 
However, if fhis correlafion (and hence fhe sum of cross-producfs) is zero, i.e. if SSxn 2 = 0/ 
fhen fhe esfimafed slopes are equal fo fhose fhaf would be obfained in fhe fwo separafe 
SLR models, because fhe second ferm in bofh fhe numerator and denominator of fhe 
expressions for fhe slope paramefer esfimafes becomes zero when SS^:jj ^2 = 0. When fhe 
correlafion befween fwo explanafory variafes is zero, we refer fo fhem as orfhogonal, and 
fheir effecf on fhe response can be ascerfained independenfly (as discussed for factor mod- 
els in Secfion 11.1). When fhe correlafion is nof zero, fhe coefficienf for one variafe musf be 
adjusfed for fhe presence of fhe ofher in fhe model, wifh fhe adjusfmenf depending on fhe 
covariance befween fhem. 



EXAMPLE 14.1C: DIPLOID WHEAT 

We now estimate parameters in the MLR model of Example 14.1B for seed weight in terms 
of the explanatory variates length (Xj = Length) and hardness index (Xj = Hardness). 

The sums of squares and cross-products for this set of variables can be calculated 
as SS,,,, = 19.2699, SS«« = 29,721.0461, SS.^y = 330.9297, SS„y = -3049.2033 and 
SS;cj„ = -94.0380. The variate means are y = 28.658, Xi = 3.295 and X 2 = 13.297. Hence, 
the parameter estimates can be obtained as 

i (29721.05 X 330.93) - (-94.04 x -3049.20) , , , 

Pi — ' ■ ' ‘ — Id. 7x34 , 

(19.27 X 29721.05) - (-94.04)^ 

; (19.27 X -3049.20) - (-94.04 x 330.93) 

R = — I , — ™ — L = _ 0.049 , 

(19.27 X 29721.05) - (-94.04)^ 

a = 28.66 - (16.93 x 3.30)- (0.049 x 13.30) = -27.795 . 

The fitted MLR model can therefore be written as 

Weight. = -27.795 + 16.93A x Lengthi - 0.049 x Hardness/ . 



Hence an increase of 1 mm in seed length corresponds, on average, to an increase in 
seed weight of 16.93 mg for a fixed value of hardness. This value is a little different to 
that obtained in the SLR model of Example 12.1B (in which the estimated slope was 
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17.17) because of an adjustment for hardness index in the model. This adjustment is 
small because the correlation between these two variables (r = -0.124, Table 14.1) is also 
small. For a given seed length, an increase of one unit in hardness index corresponds to 
a decrease in seed weight, on average, of 0.049 mg. 



14.4 Assessing the Importance of Individual Explanatory Variates 

For a MLR model, the ANOVA is used to estimate the background variability which can 
then be used to make statistical inferences, to assess whether variation associated with the 
fitted model is larger than background variation, and to assess the contributions of indi- 
vidual explanafory variafes fo fhe model. 

As wifh fhe SLR model, fhe summary ANOVA for a MLR model parfifions fhe fofal sum 
of squares, TofSS, info a componenf due fo fhe regression model, ModSS, and fhe residual 
variafion, ResSS. The calculafions for fhese quanfifies fake fhe same generic form as fhose 
presenfed in Secfion 12.3, namely 

N N N 

TofSS = ^(y, - yf ; ModSS = - yf ; ResSS = ^(y, - yf ■ 

i=l i=l 1=1 



Their form in ferms of sums of squares and cross-producfs is now more complex and so is 
omitted here. The degrees of freedom are parfifioned in a similar manner, following fhe 
same principles as for fhe SLR model. In fhe MLR model, fhe fofal df is sfill TofDF = N -1. 
The number of paramefers, p, is now equal fo fhe number of explanafory variafes plus one, 
i.e. p = q + l. The model df is fhen ModDF = q = p-l, and fhe residual df is ResDF = N -p. 
The form of fhe resulfing ANOVA fable is shown in Table 14.2. 

An esfimafe of fhe background variafion, s^, is obfained from fhe residual mean square as 



s 



2 



ResMS = 



ResSS 

N-p' 



However, as for fhe SLR model, fhis quanfify is a good esfimafe of fhe frue background 
variafion only if there is no model misspecification (see the discussion in Section 13.1). 
Diagnostic checks on the model fit and residuals should be made before conclusions are 
drawn; fhese checks are described in Secfion 14.6. 



TABLE 14.2 

Structure of the Summary ANOVA Table for a MLR Model with q Explanatory Variates and 
y = y + 1 Parameters 



Source of 
Variation 


df 


Sum of 
Squares 


Mean Square 


Variance Ratio 


P 


Model 


p-1 


ModSS 


ModMS = ModSS/ (p - 1) 


F = ModMS/ResMS 


Prob(Fp_i „_p > F) 


Residual 


N-p 


ResSS 


ResMS = ResSS/(N-p) 






Total 


N-1 


TotSS 
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The F-test for the model, based on the variance ratio F = ModMS/ResMS, now relates to 
the null hypothesis that all of the slope coefficients for the explanatory variates are equal 
to zero, i.e. FIq: P; = 0 for 1 = 1 ... q, and hence that there is no relationship between the 
response and the set of explanatory variates. The alternative h 5 q)othesis is that there is 
some relationship, and hence the regression coefficients Pj ... P^ are not all equal to zero. 
The F-test uses p - 1 numerator and N -p denominator df, corresponding to the degrees 
of freedom for the model and residual mean squares, respectively. An observed value of 
F larger than the 100(1 - ajth quantile of this distribution gives evidence at significance 
level ttg of some relationship between the response and the set of explanatory variables. 
Alternatively, an observed significance level can be calculated as P = Prob(Fp_j > F). 

EXAMPLE 14.1D: DIPLOID WHEAT 

The ANOVA table from the MLR model relating seed weight to length and hardness 
index is shown in Table 14.3. The observed variance ratio, F = 349.106, is larger than the 
99.9th percentile of the F-distribution with 2 numerator and 187 denominator degrees of 
freedom (P < 0.001) giving strong evidence of a relationship between seed weight and 
this combination of explanatory variables. We should not be surprised as we had already 
established a strong association of seed weight with seed length in the SLR model. 

A high significance {P < 0.05) for the F-test gives evidence of some association between 
the response variate and the set of explanatory variates, but does not indicate the contribu- 
tion of individual explanatory variates to this association. These individual contributions 
can be evaluated by further partitioning the model variation (ModSS) into components 
associated with each of the explanatory variates, but when the explanatory variates are 
correlated, this partition is not unique. This leads us to the concept of a sequential ANOVA 
table, in which the sums of squares depend on the order in which explanatory variates are 
added into the model. We introduced this concept in Section 11.2 for models based on fac- 
tors; here, we adapt it to regression models for explanatory variates. 

14.4.1 Adding Terms into the Model: Sequential ANOVA and Incremental 
Sums of Squares 

We can build a MLR model by starting with the intercept and adding each of the explana- 
tory variates in turn, giving a sequence of sub-models that ends with the full model con- 
taining all of the explanatory variates. These sub-models can be used to form a sequential 
ANOVA table, which quantifies the change in the model sum of squares as explanatory 
variates are added into the model in a particular sequence. 

To construct the sequential ANOVA table, we first need to define some quantities asso- 
ciated with the sequence of sub-models (previously defined for models with factors in 



TABLE 14.3 

Summary ANOVA Table for a MLR Model for Seed Weight with 
Length and Hardness as Explanatory Variates (Example 14.1D) 



Source of 
Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Model 


2 


5753.4738 


2876.7369 


349.106 


< 0.001 


Residual 


187 


1540.9352 


8.2403 






Total 


189 


7294.4090 
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Section 11.2). We identify the model sum of squares and df with a particular sub-model by 
explicitly specifying it within parentheses. For example, ModSS(/7J -t-x., +X 2 ) is the model 
sum of squares associafed wifh a model confaining fhe intercept and explanatory variates 
x^ and Xg, with degrees of freedom ModDF(/7J -t-x., -i-Xg) = 2. This nofafion is not compact, 
but it has the advantage of being unambiguous. The model containing the intercept term 
alone is regarded as the initial or baseline model, and has zero sum of squares and df, i.e. 
ModSS(f7J) = 0 and ModDF(/7J) = 0; this is discussed further below. 

The sequential ANOVA table is then derived from fhe sef of model sums of squares 
and df. As each explanafory variate is added into the model, the increase in ModSS and 
ModDF is attributed to that variate. These changes in the sums of squares and degrees 
of freedom are called fhe incremental or Type I sums of squares and df, and bofh musf 
always be greater than (or equal to) zero. These incremental quantities are labelled by both 
the explanatory variate added and the terms already in the model. For example, on adding 
the variate Xg to a sub-model containing the intercept and variate we label the change 
as +X 2 HI] + Xf, to be read as 'adding variate Xg given that the intercept and variate x^ are 
already in the model' or equivalently 'adding variate Xg after accounting for (eliminating) 
the terms [1] + Xf'. The incremental sums of squares and df are denoted SS and DF, respec- 
tively. So, for example 

SS(+Xg I /77 -r Xi) = ModSS(/'7J -r x^ -r Xg) - ModSS{[1] + x ,) , 

DF(+Xg \[1] + x,) = ModDF([77 -r x^ -r Xg) - ModDF([7i -r x^) . 

Each quantity is calculated as a difference between the model containing all of the variates 
listed and the model containing only the variates listed after the ' | ' symbol, i.e. those in 
the previous sub-model. Again, this notation is somewhat cumbersome but unambiguous. 

In the context of a sequential ANOVA, we use some abbreviations by considering the 
table as a whole. For example, instead of listing the change and the terms already present 
in the model, we can deduce the terms already in the model from previous lines in the 
ANOVA table and just indicate the additional term. Flence, we can use SS(-i-Xg) and DF(-tXg) 
to denote SS(-tXg | M) and DF(-tXg | M), respectively, where M is a list of terms added in previ- 
ous lines of the ANOVA table. For example. Table 14.4 is the sequential ANOVA table for 
fhe MLR model obtained by our fitting first the variate x^ and then the variate Xg. The first 
line of the table adds variate x., into a model containing the intercept only. The incremental 
sum of squares is then 

SS(+Xi) = SS(+Xi 1/77) = ModSS(777 -r x^) - ModSS{[1]) . 



TABLE 14.4 

Structure of the Sequential ANOVA Table for a MLR Model with Two Explanatory 
Variates, x^ and Xg 



Term 

Added 


Incremental df 


Incremental Sum 
of Squares 


Incremental Mean 
Square 


Variance Ratio 


+ x, 


DF(+Xi) = 1 


SS(+Xi) 


MS(+Xi) = SS(+x,)/l 


F"’ = MS(+Xi)/ResMS 


+ X 2 


DF(+X2) = 1 


SSl+Xj) 


MS(+X2) = SS(+X2)/1 


F"^ = MS(+X 2 )/ResMS 


Residual 


ResDF 


ResSS 


ResMS 




Total 


N-1 


TotSS 
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We then add the variate Xg into the model, with incremental sum of squares 
SS(+X 2 ) = SS(+X 2 \[1] + x,) = ModSS(/7j + x^ + Xj) - ModSS{[1] + x,) . 

Mean squares are again calculated by division of fhe incremenfal sums of squares by fhe 
corresponding incremenfal df, and variance rafios for each explanafory variafe are calcu- 
lafed wifh respecf fo fhe residual mean square, ResMS. The variance rafios can be used fo 
fesf fhe null hypofhesis fhaf fhe response has no dependence on fhe explanafory variafe 
being added, given fhaf fhe explanafory variafes previously included are also presenf in 
fhe model, i.e. fhaf fhe coefficienf for fhe explanafory variable added is equal fo zero. For 
example, in Table 14.4, variance rafio is used fo fesf fhe hypofhesis fhaf fhe response 
has no associafion wifh variafe x^, given fhaf fhe infercepf is presenf in fhe model. The 
variance rafio F'*^ is used fo fesf fhe hypofhesis fhaf fhe response has no associafion wifh 
variafe Xg given fhaf bofh fhe infercepf and variafe x^ are already in fhe model or, equiva- 
lenfly, whefher adding variafe Xg has made any improvemenf fo fhe SLR model confaining 
x^. Under fhe null hypofhesis, each variance rafio in fhis sequenfial ANOVA fable has an 
F-disfribufion wifh 1 numerafor df and denominator df equal fo ResDF. 

As sfafed above, if fhe explanafory variafes are correlafed, fhen fhe values in fhe sequen- 
fial ANOVA, and hence fhe incremenfal F-fesfs, depend on fhe order in which fhe variafes 
are added info fhe model. This reflecfs fhe facf fhaf fhe incremenfal F-fesfs are evaluafing 
differenf hypofheses for differenf sequences of sub-models. In fhe example of fwo variafes 
shown in Table 14.4, suppose fhaf explanafory variafe Xg was fiffed firsf, followed by variafe 
x^. In fhaf case, fhe incremenfal F-fesf for explanafory variafe Xg gives evidence on whefher 
fhe response is associafed wifh fhaf variafe, given fhaf fhe infercepf is in fhe model. The 
incremenfal F-fesf for fhe second explanafory variafe added (x^) gives evidence on whefher 
fhis variafe leads fo any improvemenf in fhe fif of a SLR model already confaining Xg. 
These hypofheses are differenf from fhose fesfed in fhe original sequence of sub-models 
and so if is possible for differenf resulfs fo be obfained. This can lead fo some ambiguify in 
choice of model as, for example, we may find fhaf fhe incremenfal F-fesfs for bofh Xf\[1] and 
X 2 \{[ 1 ] + Xf) are significanf for model [1] + X-^ + X 2 ,hut fhaf only fhe f esf for Xg | is significanf 
when fiffing [1] +X 2 +x^ (where fhe order of terms defines fhe order in which fhe terms are 
added fo fhe model). In general, when selecfing a model, we aim for parsimony, i.e. using 
fhe fewesf parameters possible fo gef an adequafe descripfion of fhe response variable. In 
fhis example, fhis principle would choose fhe model wifh only fhe infercepf and explana- 
fory variafe Xg, because adding x^ does nof fhen improve fhe fif. 

EXAMPLE 14.1E: DIPLOID WHEAT 

The two incremental ANOVA tables from fitting the MLR model relating seed weight 
to length and hardness index are in Table 14.5. In this case, the conclusions are straight- 
forward, as all incremental F-tests are significant, indicating that both variates are 
required in the model. Even though the correlation between the two explanatory vari- 
ates is weak, r = -0.124, the values of the incremental F-tests for the different model 
orders are distinctly different. 

There is some ambiguity in the process for fesfing incremenfal sums of squares fhaf 
requires furfher explanafion. For example, if we are considering a model confaining jusf 
variafe x^, fhen fhe ResMS in fhe (sequenfial) ANOVA fable is calculafed having removed 
fhe effecf of fhis term only. If we are considering a model wifh fwo explanafory variafes, x^ 
and Xg, and we add x., firsf, fhen we gef fhe same incremenfal sum of squares for fhaf variafe. 
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TABLE 14.5 

Sequential ANOVA Tables (Mean Squares Not Shown) from MLR for Seed Weight with 
Explanatory Variates Length (Length) and Hardness Index (Hardness) (Example 14.1E) 



Term 


df 


SS 


VR 


P 


Term 


df 


SS 


VR 


P 


+ Length 


1 


5683.18 


689.68 


< 0.001 


+ Hardness 


1 


312.83 


37.96 


< 0.001 


+ Hardness 


1 


70.30 


8.53 


0.004 


+ Length 


1 


5440.64 


660.25 


< 0.001 


Residual 


187 


1540.94 






Residual 


187 


1540.94 






Total 


189 


7294.41 






Total 


189 


7294.41 







Note: df = incremental df, SS = incremental sum of squares, VR = variance ratio. 



but the ResMS in the resulting sequential ANOVA table is now calculated after removal of 
both terms. The variance ratio for Xj and the associated F-test, which depend on the ResMS, 
will therefore differ between these two situations. Perhaps surprisingly, both tests are valid 
even though they give different results, and each approach has its own advantages. First, 
we consider the approach of adding one explanatory variate to the model at a time, forming 
fhe sequential ANOVA table, and testing against the ResMS from the current model, i.e. 
the ResMS calculated after estimation of only the terms in the current model. This has the 
disadvantage that the approach may lack statistical power at early stages when important 
explanatory variates have not yet been included in the model, because the ResMS will be 
inflated and variance ratios will therefore be reduced in comparison with those for the 
full model. The alternative approach is to use just one ResMS in the calculation of variance 
ratios, taken from the ANOVA table for the full model including all explanatory variates. 
This has the potential disadvantage that the ResMS may be based on relatively few df when 
fhere are many potential explanatory variates present, so that the tests lack power. To illus- 
trate this, we consider two contrasting situations: a designed experiment in which there are 
a few pre-defined explanatory variables; and an observational study, where many poten- 
tial explanatory variables might be available. In the first situation, the concept of the 'full 
model', containing all explanatory variables of interest, is well defined, and in this context, it 
makes sense to form a sequential ANOVA for the full model and to base variance ratios on 
the ResMS from this full model. In the second situation, it is often not clear which explana- 
tory variables are most likely to be relevant, and including the full set is unlikely to help. In 
this context, adding one term to the model at a time, and using the ResMS from the current 
model to form the variance ratios, appears more sensible. In practice, most examples fall 
between these two extremes, and the use of common sense is required. 

Throughout this section, we have used a model containing the intercept term only as our 
baseline model. This is the usual convention, but it requires some modification for regres- 
sion through the origin (Section 12.9.2) when no intercept term is included in the model. 
In this case, the definitions of model sum of squares and df used above are inappropriate, 
and must be amended to the uncorrected versions defined in Section 12.9.2. 



14.4.2 The Impact of Removing Model Terms: Marginal Sums of Squares 

We can also take a somewhat different approach to this problem and, instead of progres- 
sively adding variates into a model, we can start with a model containing the full set of 
explanatory variates and obtain marginal F-tests by removing each explanatory variate 
from the model in turn. These marginal F-tests relate to the null hypothesis that the coef- 
ficient of the Ith explanatory variate is zero given that the rest of the explanatory variates 
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TABLE 14.6 

Form of Marginal F-Tests for Two Variates, and Xg, with Full Fitted Model +x^ +Xj 



Term 

Dropped 


Marginal df 


Marginal Sum 
of Squares 


Marginal Mean 
Square 


Variance Ratio 


-Xl 


DF(-x,) = l 


SS(-x,) 


MS(-Xi) = SS(-Xi)/l 


F"' = MS(-Xi)/ResMS 


— X 2 


DF(-X2) = 1 


SSl-Xj) 


MS(-X2) = SS(-X2)/1 


F"" = MS(-X 2 )/ResMS 


Residual 


ResDF 


ResSS 


ResMS 





are present in the model. The changes in the model sum of squares caused by dropping 
each variate in turn are often called the marginal or Type III sums of squares. We denote 
these marginal sums of squares and df by defining the variate to be dropped from the 
model and the model from which it is to be dropped, for example 

SS(-Xi \[1] + Xi + X 2 ) = ModSS{[1] + Xi + X 2 ) - ModSS{[1] + x ^) , 

I [1] + + X 2 ) = ModDF(/'7J + x^ + x^) - ModDF{[1] + x^) . 

The ' - ' sign indicates that the term is to be removed from the model following the ' | ' sym- 
bol. When the full model on the right-hand side is clear from context, we abbreviate the 
marginal sums of squares as SS(-x.,). The form of the marginal sums of squares and F-tests 
for a MLR model with two explanatory variates, x^ and Xj, is shown in Table 14.6. In this 
table, SS(-Xi) is defined as above and SS(-X 2 ) = SS(-X 2|/17 + X., +X 2 ). Again, mean squares are 
obtained by division of the sums of squares by their df, and the ResMS is that taken from 
the full model specified. 

EXAMPLE 14.1F: DIPLOID WHEAT 

The marginal sums of squares and F-tests from the MLR model relating seed weight to 
length and hardness index are derived in Table 14.7. As expected, following the analysis 
of Example 14.1E, both marginal F-tests are significant. 

Note that, for the last variate added to the model, the incremental (Type I) and mar- 
ginal (Type III) sums of squares (and F-tests) are equal. In general, both the incremental 
and marginal sums of squares can help in deducing a suitable model. But when there are 
many explanatory variates, the situation becomes more complex as the number of different 
orders in which the variates can be fitted increases rapidly. One solution is the use of auto- 
matic methods for model selection and comparison, and these are described in Section 14.9 
below. In addition to Types I and III, Type II and Type IV sums of squares have also been 



TABLE 14.7 



Marginal F-Tests from MLR for Seed Weight with Explanatory Variates 
Length (Length) and Hardness Index (Hardness) (Example 14.1F) 



Term 

Dropped 


Marginal 

df 


Marginal Sum 
of Squares 


Mean 

Square 


Variance 

Ratio 


P 


- Length 


1 


5440.6436 


5440.6436 


660.249 


< 0.001 


- Hardness 


1 


70.2985 


70.2985 


8.531 


0.004 


Residual 


187 


1540.9352 


8.2404 
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defined. These are less commonly used and so are not considered here, but they are briefly 
described in Section 11.2.3. 

F-tests can also be used in a more general situation to evaluate the effect of simultane- 
ously adding or dropping a group of terms from a model (see Example 17.3B). These tests 
follow exactly the same procedures as outlined above, but they are rarely as useful as 
testing individual terms, and so we shall not give details here. For further information we 
recommend Rawlings et al. (1998). 



14.5 Properties of the Model Parameters and Predicting Responses 

As for the SLR model, in the MLR model, we can use statistical theory to obtain the sam- 
pling distribution of the parameter estimates and make statistical inferences about the 
true unknown parameters. If we can assume that the deviations follow a Normal distri- 
bution (Assumption 4, Section 12.1), then estimates of the model parameters also follow 
Normal distributions. These are unbiased estimates, so the mean of each distribution is the 
unknown population parameter. We represent the estimated variances of these parameters 
as Var(d) and Var(P;), 1 = 1 ... q, for a MLR with q explanatory variates, and SE() denotes 
the corresponding estimated standard error. The formulae represented by this shorthand 
notation are complex and so omitted here - we rely on the calculations made by statistical 
software. However, note that the estimated variances use the estimate of background varia- 
tion based on the residual mean square and so inherit the residual degrees of freedom. 

The most common use of these distributions and variance estimates is in testing the null 
hypothesis that the parameter for a given explanatory variate equals zero, i.e. testing the 
null hypothesis Hq: P; = 0 against the alternative hypothesis P, 0, for the /th explana- 
tory variate, given that the remaining q-1 variates are present in the model. This test 
statistic is calculated as 



Under the null hypothesis, this test statistic follows a t-distribution with degrees of freedom 
equal to the residual df, N - p. In fact, this test is equivalent to the marginal F-test obtained by 
dropping the /th explanatory variate from the full model, and the marginal F-test statistic is 
equal to the square of the t-statistic. Most statistical software prints these t-statistics with the 
parameter estimates (including the intercept parameter). This enables a quick assessment of 
whether individual explanatory variates can be immediately omitted from the model with- 
out worsening the model fit. However, owing to collinearity, these tests can be misleading 
if more than one variate is dropped at a time and so a sequential approach is required. 
Automatic methods of model selection exist and these are discussed in Section 14.9. 

A 100(1 - aj% confidence interval can be calculated for each regression parameter as 



(d- t^lf X SE(d), a+ x SE(d)) 
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TABLE 14.8 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed 
Significance Levels (P) for a MLR Model for Seed Weight with Explanatory 
Variates Length (Length) and Hardness Index (Hardness) (Example 14.1G) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


-27.795 


2.1653 


-12.836 


< 0.001 


Length 


Pi 


16.934 


0.6590 


25.695 


< 0.001 


Hardness 


P2 


-0.049 


0.0168 


-2.921 


0.004 



where, as before, is the 100(1 - as/2)th percentile of the t-distribution with degrees of 
freedom equal to the residual df, N -p. 

EXAMPLE 14.1G: DIPLOID WHEAT 

Table 14.8 shows the estimated parameters for a MLR model for seed weight in terms of 
length and hardness index, together with their estimated SE, t-statistic and the observed 
significance level (P) associated with testing the null hypotheses Hq: a = 0 and Hg: |3, = 0, 
for / = 1, 2. The t-tests give evidence that the response depends on each of the explana- 
tory variates when the other is present in the model, so neither explanatory variate can 
be removed without making the fit worse, which agrees with the results of Example 
14.1F. We can verify fhat the squares of the t-statistics are equal to the marginal F-tests 
and give exactly the same observed significance level (Table 14.7). 

As for a SLR model, we can use the fitted model to obtain predictions of the response 
for a given set of explanatory variate values, using the form given in Equation 14.3. An 
estimated standard error for the prediction can be calculated, but again we omit the for- 
mula here and obtain the values from statistical software. The concepts of interpolation 
and extrapolation carry over from the SLR model case, but the situation is now somewhat 
more complex because of the interplay between the explanatory variates. For a prediction 
to be considered as interpolation, it must lie within the range of values defined by the set 
of explanatory variates as a whole. For example, in the diploid wheat data (Figure 14.1), 
the observed seed lengths run from 2.5 to 4 mm and the observed hardness index lies 
between -50 and -i-20. However, if we consider the two-dimensional spread of values, there 
is only good coverage in the square defined by lengths in the range 2.75-3.75 mm and 
hardness index in the range -30 to 10. Predictions outside of this area should be considered 
as extrapolation, as there are too few data there to support the form of the model. 

Confidence intervals for any prediction can be calculated, as shown in Section 12.5, to 
obtain an interval for either the expected mean response or for an individual new observa- 
tion. Details can be found in Rawlings et al. (1998). 



14.6 Investigating Model Misspecification 

As for the SLR model, you should check the validity of the assumptions underlying the 
fitted model using the residuals. And, again, you should exclude the possibility of model 
misspecification before trying to interpret residual plots. This process is complicated by the 
presence of several explanatory variates, any of which may be subject to misspecification. 
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Here, we describe some graphical methods to verify the form of the model for individual 
explanafory variafes. Once fhis has been done, the methods of Sections 5.2 and 13.3 can 
be used to check the properties of fhe sfandardized or delefion residuals. The mefhods of 
Secfion 13.4.3 can also be used fo investigafe fhe influence of individual observations with 
respect to the model as a whole. 

The basic building block for all fhese mefhods is again fhe simple residual calculafed as 
fhe difference befween fhe observed and fiffed values, i.e. 

= y, - y, = yi - (a + PiXi + ... + . 

Sfandardized, predicfion and delefion residuals can be derived from fhe simple residuals 
as described in Secfion 13.2. Graphs of sfandardized or deletion residuals against each 
explanatory variate can indicate the presence of curvafure in fhat parficular relafionship, 
buf do nof puf fhis info fhe confexf of fhe overall model. To do fhis, we define a new type 
of residual, fhe parfial residual. The partial residual for fhe ffh observafion on fhe Ith 
explanafory variafe (1 = 1 ... q) is denofed by and calculafed as fhe simple residual plus 
fhe contribution of fhe /th explanafory variafe fo fhe ffh fiffed value, i.e. 

eii = e, + p(X/,- . 

This gives a sef of N parfial residuals for each of fhe q explanatory variates. A scatter plot 
of fhe partial residuals against the values of fhe associafed explanatory variate, known as 
a partial residual plot, should show a scatter of poinfs around fhe fitted straight line rep- 
resenting the model. Any systematic deviation, such as substantial curvature in the scatter 
of points, may indicate misspecification for fhaf explanafory variafe, which mighf be dealf 
wifh by fhe mefhods described in Chapfer 17. 

EXAMPLE 14.1H: DIPLOID WHEAT 

Figure 14.3 shows partial residual plots for the MLR model for seed weight with explan- 
atory variates Length and Hardness, with the fitted component of the model 
1 = 1,2) shown as a straight line in each case. As in the SLR model (Example 12.1), the 




Length Hardness 



FIGURE 14.3 

Partial residual plots for a MLR model for seed weight with explanatory variates (a) length (mm) and (b) hard- 
ness index, with fitted component of model ( — ) (Example 14.1H). 
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relationship with seed length is strong, with a slight hint of curvature at the ends of the 
range, where the partial residuals are all above the line representing the fitted model. 
However, this curvature is a small component of fhe overall trend. The relationship 
with hardness index is much noisier without any apparent curvature and, although 
statistically significant, the downwards trend might not be detected by eye. 

The partial residual plot gives insight into the fit of an explanatory variate already included 
in fhe model. Residual plofs can also help to invesfigafe fhe pofenfial role of an addifional 
explanatory variate fhaf has nof yef been included in fhe model. Several naive approaches 
are possible. Plotting fhe addifional explanatory variate againsf fhe response (as in fhe ini- 
fial exploratory dafa analysis) indicates whefher a relafionship exisfs in fhe absence of ofher 
explanatory variates, buf if does nof show whefher a relafionship exisfs after accounfing for 
fhe variates already in fhe model. Plotting fhe sfandardized residuals againsf fhe addifional 
variate does fake some accounf of fhe currenf model, buf fhe slope of fhis plof will nof be 
equal to fhe esfimafed slope for fhe addifional variable if if were added info fhe model (excepf 
in fhe special case where fhe addifional variable is orfhogonal to all of fhe variates in fhe cur- 
renf model). One solufion is fhe added variable plot, in which the observed slope equals the 
estimated slope that would be obtained if fhe explanatory variate was added to fhe currenf 
model. The added variable plof is a scaffer plof of fhe simple residuals from fhe currenf model 
againsf a sef of adjusted values of fhe addifional variate. The adjusfmenf obfains fhe required 
slope in fhe plof. The adjusted values are fhe simple residuals obfained from fiffing a MLR 
model wifh fhe addifional explanatory variate as fhe response, and using fhe sef of explana- 
tory variates in fhe currenf model. In fheory added variable plofs can be used to screen a sef 
of pofenfial addifional variates for inclusion in a model, buf in pracfice, fhese plofs can be 
noisy and difficulf to inferpref. They should generally be used in combinafion wifh formal 
fesfing, for example wifh fhe incremenfal F-fesfs obfained by addifion of each of fhe new 
variates in furn to fhe currenf model. Furfher defails can be found in Afkinson (1985). 



14.7 Dealing with Correlation among Explanatory Variates 

The term collinearity is used to indicate linear dependencies, or strong correlations, 
between two or more explanatory variates. The simplest case of collinearity occurs when 
two explanatory variates are strongly correlated, either positively or negatively, and this 
can be detected from a pairwise correlation matrix for the full set of explanatory variates 
(e.g. Table 14.1). Perfect collinearity occurs when the correlation between two explanatory 
variates is exactly 1.0 or -1.0, which implies that once one of these variates is included as 
an explanatory term in a model, the other provides no additional information (as it can be 
predicted exactly from the first variate). Clearly, there is then no need to have both variates 
in a MLR model, and in fact there is no unique estimate of parameter values for perfectly 
collinear variates in a MLR model. The simple solution is to include only one of these vari- 
ates. However, perfect collinearity is rarely found in practice; more often two variates will 
be strongly, but not perfectly, collinear. This can be seen in Figure 14.1, where seed length 
and diameter are strongly correlated. In these cases, the second variate may account for 
a small amount of additional variation once the first has been included. Unfortunately, 
the inclusion of strongly collinear variates in a MLR model has the effect of making their 
parameter estimates unstable and uncertain, which is reflected in large standard errors. 
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Multicollinearity is a more complex concept involving correlation among more than 
two variates. A set of variates is said to be perfectly multicollinear if any linear combina- 
fion of fhe variafes adds fo a consfanf value. For example, suppose we weigh fhe above- 
and below-ground biomass of a planf separafely, buf fhen use fhe above-, below-ground 
and fofal biomass as explanafory variafes: fhere is a perfecf addifive relafionship befween 
fhe fhree variables. Again, perfecf mulficollinearify is uncommon, buf approximafe mul- 
ficollinearify can occur and makes paramefer esfimafes unsfable. Mulficollinearify can be 
hard fo defecf, as if will offen nof be apparenf from fhe pairwise correlafions. Variance 
inflation factors (VIFs) can be used to detect (multi)collinearity within a set of explanafory 
variafes. The VIF values are calculafed for each explanafory variafe (x^ ... x^) as 

VIF, = ^, / = !...,, 



where Rf is fhe coefficienf of deferminafion (as defined in Secfion 12.5, buf also see Secfion 
14.8 below) obfained from a MLR model fiffing variafe Xy as fhe response in a model fhaf 
includes all of fhe ofher explanafory variafes. Values of fhe coefficienf of deferminafion 
Rf close fo 1 indicafe fhe presence of mulficollinearify and resulf in large values of fhe 
VIF. When fhe sef of explanafory variafes are approximafely mufually orfhogonal (i.e. all 
pairwise correlafions are close fo zero and Rf is close fo zero for 1 = 1 ... q), fhen all fhe 
VIF values will be close fo 1. The VIF can be inferp refed as fhe inflafion in fhe variance 
of fhe Ith coefficienf, (3;, compared fo a sifuafion in which fhe Ith explanafory variafe is 
orthogonal to the other explanatory variates. So, VIF; = 10 implies a 10-fold increase in 
paramefer variance compared wifh fhis fheorefical orfhogonal scenario. Whefher (mulfi) 
collinearify is problemafic depends on fhe confexf, and O'Brien (2007) caufions againsf 
fhe unfhinking use of fhresholds (e.g. VIF > 10) fo dicfafe fhaf (mulfi)collinearify musf be 
dealf with, for example by removal of one or more explanafory variafes. We suggesf fhaf 
large VIF values (VIF > 10, or equivalenfly Rf > 0.9) should prompf invesfigafion of fhe 
mulficollinearify, and considerafion of whefher if is eifher desirable or sensible fo keep 
all the explanatory variates in the model. This is illustrated in the next example, and then 
discussed in more generality. 



EXAMPLE 14.11: DIPLOID WHEAT 

We now model seed weight as a MLR model with four explanatory variates: seed length, 
hardness index, moisture content and diameter. The model is written in mathematical 
form as 



Weighty = a + ^^Length/ + P 2 Hardness^ + P, Moisture^ + P 4 Diameter ^ + 6 ; , 

where a is the intercept parameter and Pj to P 4 are the slope parameters for the four 
explanatory variates. In symbolic form, the model is written as 

Response variable: Weight 

Explanatory component: [1] + Length + Hardness + Moisture + Diameter 

The Rf and VIF, values were obtained for each of the explanatory variates in turn 
by regression on the remaining explanatory variates, and the results are listed in 
Table 14.9. 



Models for Several Variates 



363 



TABLE 14.9 



Coefficient of Defermination (K^) and Variance Inflation Factors 
(VIF) for Four Explanatory Variates (Example 14.1H) 



Variate 


R2 


VIF 


Length 


0.998 


602.83 


Hardness 


0.029 


1.03 


Moisture 


0.017 


1.02 


Diameter 


0.998 


602.79 



The large VIF values of 602.8 for bofh diameter and length are expected because of the 
strong pairwise correlation (r = 0.999) between these two variates. The small VIF values 
(~1.0) for moisture content and hardness index indicate that, in addition to them being 
uncorrelated with each of the other explanatory variates individually (Table 14.1), there 
is no linear combination of those variates that is related to either moisture content or 
hardness index. The impact of fhe collinearity between length and diameter can be seen 
by examination of the parameter estimates from the fitted model, i.e. 

Weight. = -12.97 - 40.16 x Lengthi - 0.052 x HardnesSi - 1.81 x MoisturCi 
+ 90.94 x Diameteri, 

also listed in Table 14.10. We can compare this to the fitted model obtained with only the 
first two variates, length and hardness (see Table 14.8), i.e. 

Weight. = -27.79 + 16.93 x Lengthi ~ 0.049 xHardnesSi . 



The most striking change is that the large positive coefficient for length in the model 
with two variates (+16.93) has changed to a negative coefficient (-40.16) in the model with 
four variates. This is surprising given the strong positive correlation of length with seed 
weight. The SE of this coefficient has also greatly increased (to be more than 20 times 
larger), as has the SE for the intercept. The coefficient for diameter is large and posi- 
tive, as would be expected from the strong positive correlation between diameter and 
weight, but again with a large SE. The interplay between the coefficients for length and 
diameter may suggest that seeds with a larger diameter than expected for their length 
are also likely to be heavier. However, it is arguable that the small improvement to the fit 
on addition of diameter to a model already including length is not worth the difficulty 
in interpretation and the instability indicated by the large SEs of the parameters. On 

TABLE 14.10 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and 
Observed Significance Levels (P) for a MLR Model for Seed Weight 
with Four Explanatory Variates (Example 14.1H) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


-12.970 


10.1545 


-1.277 


0.051 


Length 


Pi 


-40.155 


14.5518 


-2.759 


0.002 


Hardness 


P2 


-0.052 


0.0162 


-3.238 


< 0.001 


Moisture 


P3 


-1.805 


0.9391 


-1.922 


0.014 


Diameter 


P4 


90.943 


23.1755 


3.924 


< 0.001 
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balance here, we accept this argument and exclude diameter from the set of explanatory 
variates; we continue our search for a model in Section 14.9. In general, it should also be 
considered whether the conflict between the coefficients of length and diameter might 
be driven by a few seeds with atypical shapes, as this accommodation of a few outlying 
values might produce worse predictions for seeds with more typical shapes. This can be 
investigated using cross-validation; see Section 14.9.3. 

Collinearity is often found in data sets with few observations, where there is a greater 
chance of spurious correlation. For example, a model may be used to predict insect counts 
in a field from temperature and humidity, but these variables have similar daily and 
annual cycles. If the study covers only warm dry days, then the temperature and humidity 
measurements will be strongly correlated. If some warm but wet days were also included, 
then some contrast between the variables will appear, the correlation decreases, and the 
separate role of the two variables might be identified. A similar situation occurs when 
measurements are obtained within only a small range of the explanatory variates - vari- 
ates that have weak correlation over a wide range of circumstances may appear strongly 
correlated over a restricted range. Of course, the risk of spurious multicollinearity greatly 
increases as the number of explanatory variates increases. This emphasizes the impor- 
tance of carefully 'designing' the observations to be collected in an observational study so 
that the correlations between potential explanatory variables are kept as small as is pos- 
sible. We shall return to this matter briefly in Section 19.1. 

Most statistical software produces warnings (usually based on the VIFs) if substantial 
collinearity is found. Collinearity may also be indicated by a significant overall F-test for 
the model when all marginal F-tests are not significant, or by large changes in parameter 
estimates and SEs when a new variate is added to the model (as illustrated in Example 
14.11). However, a change in estimated coefficients does not in itself indicate a problem. 
Consider the example of insect counts predicted by temperature and humidity introduced 
above, and suppose that insect counts tend to increase with temperature but decrease with 
humidity. In this case, we might find that humidity had a positive coefficient as a single 
explanatory variate in a SLR model because of the strong correlation between humidity 
and temperature. This could change to a negative coefficient in a MLR model that included 
temperature as an additional variate, as the response to humidity would now be modelled 
after accounting for temperature. The change in value has occurred because of collinearity, 
but the collinearity is not a problem here, as including both explanatory variates produces 
a more realistic model. This demonstrates the importance of understanding the biological 
context of the relationships modelled, and of using this knowledge when constructing a 
model, rather than just using statistical methods blindly. 

Where collinearity occurs there are several possible approaches for dealing with it. If 
collinearity is present but not severe, and it is plausible that all of the correlated variates 
are introducing different biological information into the model, then all of the variates 
should be retained. If information is effectively duplicated across several variables, then 
some can be omitted with little loss of information. If the collinearity is very strong, then 
it is often better to drop variates progressively from the model: dropping either those with 
the largest VIE values, or those with large VIE values but least biological relevance. If you 
have few data, then making further observations over a wider range of the explanatory 
variates, in the hope of reducing the observed correlation between them, might help. 

Einally, there are several other statistical techniques that can be used with the full subset 
of explanatory variates even when substantial collinearity is present. Techniques such as 
ridge regression and the lasso aim to minimize the residual sum of squares subject to a 
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penalty on the size of the regression coefficients. This reduces the variance (and hence insta- 
bility) of fhe regression coefficienfs, buf af fhe cosf of infroducing bias, so fhaf fhe expecfed 
values of fhe esfimafes no longer equal fheir frue populafion values. These mefhods can 
resulf in a smaller predicfion error (fhe expecfed squared difference befween model predic- 
fions and fhe frue underlying funcfion), alfhough choice of a suifable penalfy infroduces 
some furfher complexify info fhe analysis. Furfher defails, wifh an infuifive comparison of 
fhese fechniques, are given in Hasfie, Tibshirani and Friedman (2001, Chapfer 3). 



14.8 Summarizing the Fit of the Model 

Several goodness-of-fif statistics were introduced in Section 12.5 to summarize the fit of a 
SLR model. These statistics can also be calculated for MLR models and used to compare 
different MLR models. In addition, they can help to select which subset of variates should 
be included in a model, as described in Section 14.9. 

The goodness-of-fit statistics presented in Section 12.5 can be used within the context of 
MLR models, after adjusting for the number of model parameters, p. These statistics are pre- 
sented in Table 14.11 with some additional statistics useful for evaluation of MLR models. 

The coefficient of determination (R^) and adjusted coefficient of determination (R^dj or 
adjusted R^) are defined as for the SLR model, although now the model sum of squares 
(ModSS) contains several model terms. For MLR models, the adjusted R^ statistic is 
usually preferred to the coefficient of determination, as the latter always increases 
when a new variate is added to the model, even though there might be no real improve- 
ment in the model fit. The adjusted R^ statistic takes account of the change in both the 
model sum of squares and the df through the residual mean square (ResMS) and can 
decrease if adding a new variate does not improve the model fit. Note that even if the 
adjusted R^ statistic increases when a new variate is added, this might not correspond 
to a significant incremental F-test for the new term, and so is no substitute for a formal 
statistical test. 

Other, more sophisticated, statistics are also available for comparing different models 
fitted to the same response variable. The information criteria, AIC (Akaike information 



TABLE 14.11 

Statistics Used to Assess Goodness of Fit in MLR Models 



Statistic 


Formula 


Coefficient of determination (R^) 


_ RegSS _ ^ ResSS 
TotSS TotSS 


Adjusted coefficient of determination (Radj) 


, _ ResMS 

’ TotMS [N-pj 


Akaike information criterion (AIC) 


AIC = Nx loga(ResSS) + 2 x p 


Schwarz Bayesian information criterion (SBC) 


SBC = Nx loge(ResSS) + loga(N) x p 


Mallows' Cp 


ResSSp +(2xp) N 

ResMSfaii 
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criterion) and SBC (Schwarz Bayesian information criterion), are widely used for model 
comparison. Both criteria multiply the natural logarithm of the ResSS by the number of 
observations, N, and then apply a penalty that depends on the number of parameters esti- 
mated, p. If the decrease in the ResSS on adding new explanatory variates into the model is 
large when set against the penalty for introducing those additional df into the explanatory 
model, then the values of the criteria decrease. Good models therefore correspond to small 
values of the information criteria. In the AIC, the penalty is simply twice the number of 
estimated parameters. In the SBC (sometimes also abbreviated as SBIC or BIG), the penalty 
is the number of estimated parameters, p, multiplied by logg(N). The SBC penalty therefore 
takes account of both the number of parameters estimated and the number of observa- 
tions. In practice, the SBC tends to give preference to simpler models than does the AIC. 
Both criteria can produce a ranking of competing models and can be useful for screening 
numerous possible models, as discussed in the next section. However, these criteria do not 
provide a formal test of difference in fit between competing models and small differences 
in criterion value may not indicate any meaningful difference in model fit. 

The Mallows' statistic corresponds to a situation with a total of m potential explana- 
tory variates and is used to compare the fit of a sub-model containing q of these variates 
to the full model containing all of the m variates. In the Mallows' Cp formula, ResMSf^u 
corresponds to the residual mean square of the full model (i.e. including all m explanatory 
variates) and ResSSp is the residual sum of squares for a sub-model of interest that contains 
q explanatory variates (q < m) and hence p = q + l parameters. The value of Cp for the full 
model always equals m + 1, the total number of parameters in that model. Any sub-model 
containing q variates that has a similar value of the residual mean square (and hence simi- 
lar precision) to the full model, will have a Cp value close to p. The Mallows' Cp statistic 
can also be used to screen competing models, and is best visualized by plotting values of 
Cp against p together with the line Cp = p, so that good models appear close to this 1:1 line. 
Again, this statistic does not provide a formal test of difference in fit between models. 

The use of these statistics in model comparison is illustrated in Example 14.1J. 



14.9 Selecting the Best Model 

When there are many explanatory variates, subsets of them can be formed in many dif- 
ferent ways giving numerous possible models. In this situation, one of the main aims 
of regression analysis is to choose a subset of explanatory variates that provide a good 
description of the response. Here, a good model is one that accounts for the maximum 
amount of variation in the response with the minimum possible number of parameters, 
following the principle of parsimony. The process of finding such a model is known as 
model selection. 

Procedures for model selection fit a number of possible models, which are assessed by 
one or more summary statistics, usually the goodness-of-fit statistics defined in Section 
14.8. If several models appear to perform equally well, each of these candidate models may 
be studied in detail to detect collinearity, misspecification, or departures from the under- 
lying assumptions. The statistical significance of the estimated parameters should be 
checked, and any biological interpretation of the model parameters should be considered. 

Ideally, candidate models should be selected from the set of all possible models. The 
number of possible models increases rapidly with the total number of explanatory variates. 
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however. With q explanatory variates, there are 2“' different possible models (including the 
null model which contains the constant term only). For example, if we have 15 explana- 
tory variates, fhen fhere are 2^® = 32,768 possible models. The fitting and comparison of all 
possible models is often nof compufafionally pracfical when fhere are many explanafory 
variafes and, in fhese cases, aufomafic model selecfion sfrafegies are usually employed. 
Some caufion is required: if fhere are more explanafory variafes fhan observafions, fhen an 
exacf fif to fhe dafa can be obfained, buf fhis fif is completely uninformafive wifh respecf 
to fhe role of fhe explanafory variafes in a larger sfudy; fhis is an example of over-fiffing 
(see also Secfion 17.1.2), where fhe model fifs spurious defail af fhe expense of describing 
fhe larger-scale frends. 

In facf, fhe pracfice of selecfing models and esfimafing paramefers on fhe same dafa is 
fraughf wifh dangers, simply because models fhaf over-fif often give better goodness-of-fif 
sfafisfics. Their parameter esfimafes will be subjecf fo bias resulfing from fhe model selec- 
fion procedure (selecfion bias), and our assessmenf of model fif will be over-opfimisfic 
(predicfed errors will be foo small). Forfunafely, some of fhese problems can be reduced 
by cross-validafion (infroduced in Secfion 13.5). We refurn fo fhis in Secfion 14.9.3 after 
we have described some common fechniques for selecfing models, and fheir associafed 
problems, in more defail. 

In fhe following example, we firsf illusfrafe model selecfion by fiffing and comparing all 
possible subsefs of explanafory variafes (known as all subsets selection). In Section 14.9.1, 
we then discuss some of fhe aufomafic sequenfial procedures for selecfing models fhaf are 
useful for large numbers of explanafory variafes. 

EXAMPLE 14.1J: DIPLOID WHEAT 

To find the best set of explanatory variates to describe seed weight, we fitted models 
with all subsets of the explanatory variates seed length, hardness index and moisture 
content. The variate diameter was excluded because of its collinearity with length, as 
discussed in Example 14.11. A summary of the goodness-of-fit statistics obtained for the 
eight possible models is in Table 14.12. 

The 'best' model according to each statistic is highlighted in bold type. For M and 
adjusted larger values indicate better models, and both statistics have their largest 
values for the full model with all three explanatory variates. For the AIC and SBC sta- 
tistics, smaller values indicate better models. The AIC takes its minimum value for the 
model with all three variates, but the SBC takes its minimum value for the model with 



TABLE 14.12 

Summary Statistics for MLR Models for Seed Weight: Goodness-of-Fit Statistics for All 
Possible Subset Models (Example 14.1J) 



Explanatory Model 


P 


RhxlOO) 


Ra'dj(xlOO) 


AIC 


SBC 


Cr 


[1] 


1 


0.00 


0.00 


1692.0 


1695.3 


704.6 


[1] + Length 


2 


77.91 


77.79 


1407.1 


1413.6 


11.2 


[1] + Hardness 


2 


4.29 


3.78 


1685.7 


1692.2 


668.4 


[1] + Moisture 


2 


0.39 


0.00 


1693.3 


1699.8 


703.1 


[1] + Length + Hardness 


3 


78.88 


78.65 


1400.6 


1410.4 


4.6 


[1] + Length + Moisture 


3 


78.09 


77.85 


1407.6 


1417.3 


11.6 


[1] + Hardness + Moisture 


3 


5.03 


4.02 


1686.2 


1696.0 


663.7 


[1] + Length + Hardness + Moisture 


4 


79.16 


78.83 


1400.0 


1413.0 


4.0 



Note: The 'best' model for each statistic is indicated in bold. 
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two variates: length and hardness index. On further inspection, this model is also close 
to the optimum values for AIC and both coefficients of defermination. The Mallows' 
Cp statistic is relatively, but not convincingly, close to the target value of three for fhe 
model containing length and hardness index, but is much larger for all other one- and 
two-variate models. 

Our candidate models for closer inspection are therefore the full model and the 
model containing length and hardness index. The simplest quantitative way of compar- 
ing fhese fwo models is to examine the incremental F-test for adding moisture content 
to a model already containing length and hardness index. The variance ratio for this test 
has value F = 2.57 with 1 and 186 df (P = 0.111). This suggests that there is no statistical 
improvement achieved by addition of moisture content to the simpler model. We have 
already found no evidence of misspecification for the model with length and hardness 
index as explanatory variates (Example 14.1H) and residual plots for this model accord 
with the assumptions regarding the deviations (Figure 14.4), although there is still a 
suggestion of underestimation of seed weight for very small and very large fitfed val- 
ues. The fitted model was presented in Example 14.1C, and the parameter estimates were 
listed in Table 14.8. 

The final predictive model can be written as 

y{Length, Hardness) = -27.79 + 16.93Length - 0.049 Hardness . 



This model can be used to predict the potential gain in plant yield that might be expected 
for a given increase in seed lengfh and decrease in hardness index, on the assumption 





Fitted value 



Fitted value 





Standardized residual 



Normal quantile 



FIGURE 14.4 

Composite set of residual plots based on standardized residuals for a MLR model for seed weight with explana- 
tory variates length and hardness index (Example 14.1J). 
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that this could be achieved without affecting other aspects of the plant, such as the total 
number of seeds. 



14.9.1 Strategies for Sequential Variable Selection 

Automatic sequential selection strategies are often preferred when there are so many 
explanatory variates that fitting and comparing all sub-models becomes impractical. 
Automatic methods usually select a good set of explanatory variates, though not necessar- 
ily the 'best' set. They are designed to be computationally efficient and so to investigate a 
relatively small number of the possible sub-models, but still have the pitfalls of bias, over- 
fitting and over-optimism that we discuss in Section 14.9.2. 

All of the sequential methods start with an initial or reference model, and at each step of 
the process they add or remove one explanatory variate. In all of the strategies presented 
here, the change at each step is evaluated via an F-test, and a threshold value must be 
chosen to control the process. It is easiest to explain the detail of these concepts in context, 
so below, we describe the three most common automatic selection methods: forward selec- 
tion, backward elimination and stepwise regression. 

Forward selection starts with the baseline model, containing the intercept term alone, 
and at each step adds the explanatory variate that gives the biggest improvement to the 
model fit, as measured by an incremental F-test, subject to this exceeding a threshold. This 
threshold can be defined as the minimum value of the incremental F-statistic that must be 
achieved in order for the variate to enter into the model, denoted F;^. If the Fj^ threshold 
value is chosen to be large, then the final model tends to contain fewer variates than for a 
smaller Fj„ threshold. The threshold can alternatively be defined in terms of the observed 
significance level of the incremental F-statistic, denoted SLE (significance level to enter). In 
this case, terms are added into the model only if the observed significance level is smaller 
than the SLE; the choice of a smaller SLE value leads to a final model with fewer terms. 

Backward elimination starts with the full model, i.e the model containing all the explan- 
atory variates, and at each step eliminates the variate that gives the smallest change to the 
model fit, as measured by a marginal E-test, subject to this being smaller than a threshold. 
This threshold can be defined as the maximum value of the marginal E-statistic that is 
allowed for a variate to be eliminated from the model, denoted E^^. If the E^^ threshold 
value is chosen to be large, then the final model tends to contain fewer variates than for a 
smaller E^^ threshold. The threshold can alternatively be defined in terms of the observed 
significance level for the marginal E-statistic, denoted SLS (significance level to stay). In 
this case, terms are eliminated from the model if the observed significance level is larger 
than the SLS; the choice of a smaller SLS value leads to a final model with fewer terms. 

Both of these strategies can run into problems caused by multicollinearity among 
explanatory variates. In forward selection, a variable selected at an early step might not be 
required in the model once certain other variates have also been included. It might then 
be appropriate to remove it from the model, but this is not allowed within the forward 
selection framework. The reverse situation can occur with backward elimination, where 
variates removed at an early stage might later be used to improve the model, if this were 
allowed. The stepwise strategy incorporates such additional steps into the selection pro- 
cess. In its most general form, this procedure evaluates at each step the effect of dropping 
each of the explanatory variates currently in the model, and the effect of adding each 
explanatory variate currently excluded. Model fit is quantified by some goodness-of-fit 
statistic (often ResMS) and the step that gives the best value of this statistic will be taken, 
subject to the change passing the forward/backward threshold criterion. Many variants 
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on this procedure are possible. For example, the variant forward stepwise selection starts 
as forward selection, but after each forward step, it switches to backward elimination, 
until no further variates can be dropped, and then forward selection is resumed. The vari- 
ant backward stepwise selection uses the reverse procedure. In all cases, the final model 
is obtained when no further changes can be made. However, if the threshold Fj,, is smaller 
than threshold F^^t (or SLE > SLS), then stepwise algorithms can get caught in a loop, add- 
ing and then dropping the same variable repeatedly, and so a maximum number of steps 
is often specified. Because of the switch between forward and backward steps, stepwise 
selection is more flexible than the single direction strategies and should therefore have a 
greater chance of selecting a good model for the data. 

Clearly, the threshold values of Fj^ and F^^ (or SLE and SLS) have a large influence on 
the model selected. The significance level associated with a particular E;,, will obviously 
depend on the residual degrees of freedom, which decrease as new variates are added, so 
that fixing Ej„ results in the significance level increasing at each step. Conversely, fixing 
the significance level (SLE) results in the incremental E-statistic threshold increasing as 
more terms are added to the model. These changes will be small unless the residual df are 
small or the number of explanatory variates added is large. Conversely, the significance 
level associated with a fixed E^^t will decrease as terms are dropped, and the marginal 
E-statistic threshold associated with a fixed significance level (SLS) will also decrease. But 
in both cases, remember that the residual mean square changes as terms are added or 
dropped and this may perturb the expected pattern. 

Eor forward selection, it is often argued that the ResMS of the null model will be much 
larger than true background variation because it includes contributions from important 
explanatory variates not yet entered into the model. This will reduce the observed E-statistics 
so that a smaller threshold of E;,, (or equivalently a larger value of SLE) is appropriate. A 
typical value of Ej„ is 2, and SLE values are commonly set around 0.15. These values are 
approximately equivalent for large data sets {N > 100), but the criterion SLE < 0.15 is more 
stringent for models with fewer residual df. It can be argued that these thresholds should 
be tightened (i.e. Ej^ increases, SLE decreases) as more explanatory variates are included in 
the model and any inflation of the ResMS is reduced. Backward selection starts with all of 
the explanatory variates in the model and so the ResMS should not be inflated, although it 
may be an unreliable estimate if the residual df of the full model is very small; in this case, 
backward selection is not advised. A typical value of E^^t for backward selection is 4, cor- 
responding to a SLS value of 0.05 (for N = 60). An initial discrepancy between Ej^ and E^^ (or 
SLE and SLS) values is therefore based on a sound rationale despite the fact that it can cause 
recursion in stepwise selection procedures. 

The details of the selection procedures, such as use of Ej^ and E^^j rather than SLE and 
SLS and default values for the thresholds, differ among statistical packages; you should 
therefore always check the documentation. These variations mean that packages may 
select different final models, again indicating the need for cautious use of such approaches. 
We suggest that automatic selection procedures are used as the first step in a modelling 
exercise, followed by comparison of the 'best' model(s) identified and taking account of 
the biological context. The analysis in the following example, which illustrates the use of 
stepwise selection, was done using GenStat. 

EXAMPLE 14.2: APHID CATCH 

The EXAMINE project collated data on aphid catches in suction traps across Europe 
(www.rothamsted.ac.uk/examine/) to investigate environmental and landscape influ- 
ences on the timing of aphid flight and abundance. Here, we investigate the relationship 
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FIGURE 14.5 

Approximate locations of 50 EXAMINE project aphid suction traps in 1995 in 11 European countries (with 
capital cities •): o Belgium (Brussels), ★ Czech Republic (Prague), ■ France (Paris), ♦ Germany (Berlin), ▲ Greece 
(Athens), T Hungary (Budapest), • Italy (Rome), A Luxembourg (Luxembourg), 0 Slovenia (Ljubljana), □ Sweden 
(Stockholm), v UK (London) (Example 14.2). 



between the Julian day of the first catch of the aphid Myzus persicae at 50 locations 
(Figure 14.5) during 1995 and several geographical, meteorological and land-use vari- 
ates. The three geographical variates were latitude, longitude and altitude of each trap. 
The 10 meteorological variates were monthly rainfall from October 1994 to May 1995, 
mean temperature for the coldest 30-day period at that trap site and mean temperature 
for the following 60-day period. These variates were chosen as those most likely to 
affect aphid flight dates, which were expected to be earlier in warmer, drier climates. 
The eight land-use variates gave the proportions of land in a circle of radius 75 km 
around the sampling site under different uses (coniferous, deciduous or mixed forest, 
grassland, arable land, inland waters, sea or urban). Note that these proportions do not 
sum to 1 for most sites as several land-use categories with overall small proportions 
have been omitted. The data set can be found in file examine.dat and in Table A.3. Table 
14.13 lists the explanatory variates and their symbolic names. 

The Julian day of first catch ranges between 1 (1 January) and 205 (24 July) with mean 
124.3 (4 May), lower quartile 100 (10 April) and upper quartile 146 (26 May). We first con- 
sider exploratory data analysis, as strong correlations between geographic and climate 
variables are likely, but a scatter plot matrix becomes impractical with 21 explanatory 
variates. A correlation matrix of the response variate (Julian day of first catch) and all 
explanatory variates can be scanned for instances of strong correlation. The response 
(date of first catch) shows a strong positive correlation (r = 0.73) with the trap site latitude 
and a strong negative correlation (-0.80) with the mean temperature in the 60 days after 
the coldest period. The strongest correlation (0.90) within the set of explanatory variates 
is between the mean temperature in the coldest 30-day period (denoted C30Day, see 
Table 14.13) and that in the following 60 days (denoted FGODay). This strong correlation 
is expected, but these variates are together intended to quantify the depth and length of 
the winter period, and so both will be retained for analysis. Not surprisingly, there is a 
negative relationship between latitude and C30Day (-0.48) or FGODay (-0.70) as winter 
temperature decreases as latitude increases. There are also strong positive correlations 
(0.70-0.76) between monthly rainfall in December, January and February. Finally, there 
is a strong positive correlation between the proportion of mixed and deciduous forest 
close to a trap site (0.72) and a positive correlation between monthly May rainfall and 
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TABLE 14.13 

Explanatory Variates Available for Modelling Date of First Aphid Catch (Example 14.2) 



Description 


Name 


Description 


Name 


Weather Variables 




Geographic Variables 




Monthly rainfall: 




Site latitude 


Latitude 


October 


OctRain 


Site longitude 


Longitude 


November 


NovRain 


Site altitude 


Altitude 


December 


DecRain 


Land-Use Variables 




January 


JanRain 


Proportion of area under: 




February 


FebRain 


Coniferous forest 


ConForest 


March 


MarRain 


Deciduous forest 


DecForest 


April 


AprRain 


Mixed forest 


MixForest 


May 


MayRain 


Grassland 


Grassland 


Mean temperature during: 




Arable crops 


Arable 


Coldest 30-day period 


C30Day 


Inland water 


InlandWater 


Following 60-day period 


F60Day 


Sea 


Sea 






Urban 


Urban 



the proportion of mixed forest (0.79). All other correlations are less than 0.70 in absolute 
value. 

There are more than two million subsets of 21 explanatory variates, so testing all 
possible subsets is impractical. Instead, automatic model selection strategies were 
implemented with GenStat. Here, four selection strategies were used for the full set of 
21 explanatory variates: forward selection, backward elimination and stepwise selec- 
tion starting from either the null model (forward stepwise selection) or the full model 
(backward stepwise selection). Thresholds were chosen as Fi„ = 2 and F^m = 4. Hence, 
variates were added into the model if their incremental F-statistic was greater than 2 
and were dropped if their marginal F-statistic was less than 4. The results for forward 
selection and forward stepwise selection were the same except for variate MarRain 
(monthly March rainfall) which the stepwise procedure cycled over adding and drop- 
ping (with F = 2.45). The models for backward elimination and backward stepwise 
selection were also the same except for variate MarRain which the stepwise procedure 
again cycled over adding and dropping (with F = 2.15). In both cases, the marginal 
F-tests for the MarRain variate were not significant (P > 0.05) and it was excluded. The 
models from the forward and backward strategies then accounted for 89.3% and 89.2% 
of the variation (adjusted M), respectively. The set of explanatory variables selected by 
forward stepwise selection were (in order, with the variate names as defined in Table 
14.13) 

FGODay, MayRain, OctRain, FebRain, Urban, DecForest, Longitude and NovRain. 
The set of explanatory variables retained by backward stepwise selection were 

FebRain, MayRain, OctRain, NovRain, C30Day, DecForest, Urban, Altitude, Latitude 
and Longitude. 

To investigate differences in fit between these two models, we can use a scatter plot of 
the two sets of fitted values and calculate the correlation between them. In the scatter 
plot (not shown), the fitted values from the two models are very closely related, which 
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TABLE 14.14 



Parameter Estimates (Standard Errors) and Observed Significance Levels 
(P) for MLR Models for Julian Day of Eirst Catch Obtained by Eorward 
Selection or Backward Elimination Strategies (Example 14.2) 



Term 


Forward Stepwise Selection 
Estimate (SE) P 


Backward Stepwise Elimination 
Estimate (SE) P 


[1] 


238.4 (11.43) 


< 0.001 


-88.4 (52.01) 


0.097 


Latitude 


— 


— 


4.68 (0.854) 


< 0.001 


Longitude 


1.16 (0.415) 


0.008 


1.70 (0.494) 


0.001 


Altitude 


— 


— 


0.05 (0.025) 


0.047 


OctRain 


-0.46 (0.076) 


< 0.001 


-0.39 (0.079) 


< 0.001 


NovRain 


0.23 (0.107) 


0.036 


0.28 (0.108) 


0.013 


FebRain 


0.37 (0.090) 


< 0.001 


0.35 (0.095) 


< 0.001 


MayRain 


-0.72 (0.087) 


< 0.001 


-0.63 (0.097) 


< 0.001 


C30Day 


— 


— 


-5.45 (1.577) 


0.001 


FGODay 


-14.74 (1.132) 


< 0.001 


— 


— 


DecForest 


92.09 (28.387) 


0.002 


129.20 (34.911) 


< 0.001 


Urban 


-194.76 (67.595) 


0.006 


-196.70 (68.107) 


0.006 



is reflected in the correlation coefficient of 0.995. The parameter estimates from both 
models are shown in Table 14.14. 

There is a set of seven explanatory variates common to both models, i.e. 

Longitude, OctRain, NovRain, FebRain, MayRain, DecForest and Urban, 

and the parameter estimates for these explanatory variates are broadly similar in 
sign and size in the two fitted models. The backward method has retained Latitude, 
Altitude and C30Day in place of FGODay, which was selected by the forward method. 
Taking into account the observed correlations between these explanatory variates 
(shown in Table 14.15) and their estimated coefficients, it seems plausible that the 
combination of Latitude, Altitude and C30Day accounted for winter temperatures in 
an equivalent manner to FSODay. 

One strategy for finding a final model is to take a model consisting of the seven vari- 
ates held in common across the two models, and then to make an exhaustive search on 
adding subsets of the four variates that are in disagreement (C30Day, F60Day, Altitude 
and Latitude). The results of this search are shown in Table 14.16, which evaluates the 
adjusted M, AIC and SBC statistics for each model and obtains the observed signifi- 
cance level (P) associated with the marginal E-test for each of these four explanatory 
variates in each model. 

The best model in terms of all three criteria is the one obtained by addition of variate 
FGODay only, the same model chosen by the forward selection method. The second best 
model for adjusted M contains the other three variates, i.e. the model selected by the 
backward methods, but this is not the second best model for AIC or SBC, which instead 
choose the model with FGODay and Latitude, in which the parameter for Latitude is 
not significantly different from zero (P = 0.539). On balance, we prefer the simpler, 
more parsimonious, model, in which only FGODay is added. Now, we are in position 
to look more closely at the properties of this model. Partial residual plots (not shown) 
suggest no evidence of model misspecification, and residual plots (Figure 14.6) show 
no great cause for concern. The Cook's statistics (not shown) suggest the presence of 
one influential observation, but omission of this observation has little impact on the 



374 



Statistical Methods in Biology 



03 

U 

75 

D- 

< 



Q 



03 

Q 

03 



O 

T3 

O 

s 



03 

% 

03 

U 

•S 

T3 

o 

T3 

;3 



O 

-M 

03 

C 

03 

’O. 

X 

w 

T3 

C 

03 

O 

<X) 

S= 

o 

Cu 

03 

0) 



CO 

< 



rvj 

TfH 

03 

X 

w 



^s CT3 CO O 



O O O O 



CO 00 CO OS 
^ O LD 

o o o o 



o o o o o o o 



I 



g § 

° E£ 



OS Q 

sO o 

o CO 
I O 



so CN so 



o o o o 



(N TjH tJh O Cs| 

SO rs| rs| (N ^ 

o d d o o 



o o o o o o 



GO 00 so 

rs| (N T-H 

d d d 



GO O CO 



.c 

cB 

qc 


.c 

cB 

Qc 


c: 

cB 


£ 

£ 


c: 

cc 


§ 


-Q 


1 


o 





o 



c 

cB 

I 



§ I 



-Q 
I Ul 



c 

cB 

QC 



c 

SE 

o ^ 
I O 



oooooooooo 



T-HOsOLDOLnoOOOsCNl 

coomLnicsiO(Nsoco«N 



o o o 



■§ 

a 

5) 

c 

o 

-J 



CO CO 

o c 



Models for Several Variates 



375 



TABLE 14.16 

Summary Statistics AIC and SBC) for Addition of All Possible Subsets of Variates C30Day, 
FGODay, Altitude and Latitude to a MLR Model with Seven Other Explanatory Variates, with 
Observed Significance of the Marginal F-Test for Each Explanatory Variate (Example 14.2) 

Goodness-of-Fit Statistics Observed Significance Level for Marginal F-Tests (P) 



p 


Rfdj(xlOO) 


AIC 


SBC 


C30Day 


FGODay 


Altitude 


Latitude 


8 


46.23 


549.68 


564.97 


— 


— 


— 


— 


9 


89.28 


469.86 


487.06 


— 


< 0.001 


— 


— 


9 


82.20 


495.20 


512.41 


— 


— 


— 


< 0.001 


9 


81.74 


496.47 


513.68 


< 0.001 


— 


— 


— 


9 


46.77 


549.96 


567.17 


— 


— 


0.238 


— 


10 


89.11 


471.38 


490.50 


— 


< 0.001 


— 


0.539 


10 


89.01 


471.83 


490.95 


0.877 


< 0.001 


— 


— 


10 


89.01 


471.85 


490.97 


— 


< 0.001 


0.941 


— 


10 


88.29 


475.04 


494.16 


< 0.001 


— 


— 


< 0.001 


10 


86.19 


483.25 


502.37 


— 


— 


0.001 


< 0.001 


10 


81.29 


498.44 


517.56 


< 0.001 


— 


0.880 


— 


11 


89.15 


471.92 


492.95 


0.001 


— 


0.047 


< 0.001 


11 


88.96 


472.82 


493.85 


— 


0.002 


0.513 


0.374 


11 


88.96 


472.82 


493.86 


0.513 


0.071 


— 


0.380 


11 


88.73 


473.82 


494.85 


0.886 


< 0.001 


0.959 


— 


12 


88.99 


473.37 


496.31 


0.297 


0.522 


0.297 


0.175 



Note: The selected model is indicated in bold. 

parameter estimates. We therefore accept the forward selection model and move on to 
interpretation. 

The fitted model (Table 14.14) suggests that the date of first catch of M. persicae is 
earlier within the year (has smaller fitted values) when the post-winter temperature (as 
measured by variate FGODay) is higher, for larger proportions of urban land use around 
the trap site, and when there is more rain in the previous October or in May of the 
same year. Conversely, the date of first catch is later (larger fitted values) when the trap 
site longitude is larger (further east within Europe), when the proportion of deciduous 
forest area around the trap site is greater, and when there is more rain in the previous 
November or in February of the same year. Of course, these variables do not vary inde- 
pendently, but a prediction of dafe of first catch can now be made for any site for which 
all these variables have been recorded. As the observations were all made in the same 
year, predictions cannot be made with confidence for other years without expansion of 
the study (as systematic differences between years would be expected), but this model 
should still provide information about the relative difference in flight dates for different 
environmental conditions. 

As already noted, all automatic model selection strategies should be used with caution. 
There is usually no unambiguous 'best' model, because it will depend on the strategy 
used and the thresholds chosen. Similarly, even if all sub-models can be evaluated, the 
'best' model may depend on the selection criterion used. It is often sensible to try sev- 
eral different approaches, as shown in Example 14.2, to select a few candidafe models for 
furfher invesfigafion. These should fhen be sfudied in detail with regard to the model 
fit, assumptions (by checking residuals and investigating outliers) and to their biological 
interpretation. 
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Fitted value 




Standardized residual 




Normal quantile 



FIGURE 14.6 

Composite set of residual plots based on standardized residuals for a MLR model for Julian day of first aphid 
catch with eight explanatory variates (Example 14.2). 



14.9.2 Problems with Procedures for the Selection of Subsets of Variables 

Above, we stated that procedures for model selection are subject to bias, over-fitting and 
over-optimism. In this subsection, we attempt to give some insight into these matters. 

We start by considering the assumptions behind the incremental F-test for adding a 
term to a model, and how this relates to its role in forward selection. The distribution of 
fhe incremenfal F-test under the null hypothesis relates to a single pre-determined test. 
However, at the first step of the forward selection, the procedure calculates the incremen- 
tal F-tests for each of the explanatory variates and chooses the largest to compare with the 
F;,, threshold. Because we have deliberately chosen the largest of the set, this statistic will 
tend to be larger than we should expect under that F-distribution. The true significance 
level can therefore be much greater than the nominal value. We may therefore expect that 
some of fhe variables selected are actually unrelated to the response. This can lead to 
the phenomenon of over-fitting, where some of fhese exfra variates accommodate random 
fluctuations at the expense of the overall trend. Including these extra explanatory variates 
in the model also reduces the ResMS and hence estimates of error - this makes estimates 
of uncertainty over-optimistic (too small). Similar considerations apply to all of fhe subset 
selection procedures in this section. For example, all subsets selection evaluates all subsets 
of a given size and chooses fhe best: again, chance variation can lead to the inclusion of 
some variables with no real underlying relationship with the response. 

Miller (2002) suggests that one simple way to indicate whether uninformative variates 
have been included in a model is to introduce some new explanatory variates generated 
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from random numbers, then to repeat the selection procedure including these new vari- 
ates. The point at which these random variates start appearing in the selected model indi- 
cates the point at which no further useful information is being added, or where spurious 
information is being retained. Unfortunately, this approach requires that the number 
of random variates used is equal to the total number of explanatory variates, and so is 
impractical for small data sets with many explanatory variates. In other cases, it can give 
good insight into the possible reliability (or otherwise) of a selected model. 

In addition to the inclusion of spurious variates, the process of model selection means 
that the coefficients of the selected variates tend to be biased. To demonstrate this, we con- 
sider two explanatory variates, both with the same underlying true correlation with the 
response. The variate that, by chance, appears more strongly correlated with the response 
in the observed sample is more likely to be selected, and will also tend to have a larger 
absolute regression coefficient (positive or negative) than expected. This is known as selec- 
tion bias and should not be confused with the systematic bias arising from the omission of 
an important explanatory variate from the model. There is no way to avoid selection bias, 
except by selection of the model on one data set, then estimation of the model parameters 
from another, independent, set. However, the reduction in bias achieved by doing this 
might be outweighed by the increase in uncertainty caused by use of a smaller data set for 
inference. 

These issues are intrinsic to the selection procedures and are discussed in detail by 
Miller (2002). In practice, they are difficult to avoid, but you should be aware of these prob- 
lems and be properly sceptical about the results of any selection procedure. One method 
that can combat over-fitting is cross-validation, and its use in model selection is described 
in the next section. 

14.9.3 Using Cross-Validation as a Tool for Model Selection 

In Section 13.5, cross-validation was used as a tool to diagnose model fit; here we use it as 
a tool for model selection. In this situation, the purpose of cross-validation is to obtain an 
unbiased measure of the predictive ability of competing models, so the model with the best 
performance can be selected. In the simplest case, the data is partitioned into two parts: 
the training set and the validation set. The training set is used to estimate parameters for 
some candidate models. We then obtain predictions from each of the models for the units 
in the validation set. The predictive ability of the candidate models can be evaluated by 
calculation of statistics such as the mean square error of prediction (MSEP or RMSE), mean 
absolute difference (MAD) or prediction bias (PB), as defined in Section 13.5. The candidate 
model with the smallest value of MSEP, or some other combination of these statistics, is then 
selected. Because the validation set is independent of the training set, the MSEP gives an 
unbiased estimate of the squared error of prediction for the selected model. A model with 
too many explanatory variates, that over-fits the training set, is unlikely to give good pre- 
dictions for the unrelated validation set; hence, this approach guards against over-fitting. 

When the data set is too small to be divided into two separate subsets, k-fold cross-vali- 
dation can be used instead. This variant divides the data into k subsets of (approximately) 
equal size. Each subset is used in turn as the validation set, with the remainder allocated 
to the training set. Again, the candidate models are fitted for each training set and then 
predict the response in each validation set. The evaluation statistics are calculated for each 
validation set, accumulated across sets, and then the model with best overall predictive 
ability is chosen. In this case, the training and validation sets are clearly not independent, 
but can still give a reasonable comparison of predictive performance. 
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14.9.4 Some Final Remarks on Procedures for Selecting Models 

We have described above some of the perils of model selection. Despite these perils, model 
selection is useful for identification of imporfanf explanatory variates and obtaining pre- 
dictions, so long as the results are treated with due scepticism. However, selection of a 
single model can be dangerous, as often several models, possibly containing different 
explanatory variates, can give a similar goodness of fif by adapfing to different features of 
fhe dafa. You should remember fhaf fhere is usually no 'frue' model, and fhaf we are usu- 
ally seeking a descripfive model fhaf gives a good predicfion of the response. In general, 
this does not imply any causative relationship, and often we shall be unable either to iden- 
tify or measure the underlying variables that actually cause the response. Hence, several 
different descriptions can perform equally well and cross-validation can be used to com- 
pare the predictive ability of differenf models. Alfernafively, the technique of model aver- 
aging mighf be used to combine predictions from several differenf models (see Chapfer 8 
of Hasfie ef al., 2001, for a discussion of this topic). 

We have described our strategies for model selecfion in ferms of MLR, where the model 
consists of a sef of explanatory variates. The methods apply to any linear models, including 
the models containing variates and factors and their interactions introduced in Chapter 15, 
where this topic is discussed further. 

EXERCISES 

14.1 A random sample of a vegefafively propagafed family of 4-year-old loblolly 
pine frees was faken from a sfudy locafed af Randolph Counfy, Georgia, wifh 
fhe objecfive of describing average crown widfh (CW, m) in ferms of explana- 
tory variables thaf are simpler to measure, such as diameter af breasf heighf 
(DBH, cm), fofal free heighf (Hf, m) and heighf to live crown (HLC, m). Two 
addifional crown variables were measured as fhe average from fhree randomly 
selected branches from each free: branch diamefer (DiamB, cm) and angle 
(AngB, degree). File crown.dat confains unif numbers (ID) wifh fhese variates 
(CW, DBH, Ht, HLC, DiamB, AngB): 

a. Firsf consider fhe fhree simplest explanatory variates: DBH, Ht and HLC. Fit 
a SLR model with response crown width (Cl/1/) for each individual explana- 
fory variafe, and compare if to a MLR model including all fhree variates. 
Can you reconcile fhe resulfs? Which subsef of fhese fhree variates besf 
describes crown widfh? Obfain residual plofs to check fhe fit and write 
down and interpret your final predicfive model. 

b. Now consider incorporafing the two additional variables, DiamB and AngB, 
and repeat the subset selection process. Do you obtain the same result by 
considering these two additional variates with only those already selected 
in part (a) and, if so, discuss whefher this will always be true or whether this 
strategy might sometimes fail? 

14.2 Samples of foliage from plofs of red pine were analyzed to establish whether 
foliar nufrienfs could predicf growfh (Bliss, 1970, Exercise 18.8). File foliar.dat 
confains fhe plof number (Plot) wifh fhe quanfify (mg) of pofassium (variafe 
K) and calcium (variafe Ca) found in foliar samples of given weighf (variafe 
SampleWt, g) fogefher with the increase in height (variate IncHt, ft) and basal 



Data from FBRC, University of Florida. 
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area (variate IncBA, sq ft/acre) over a 5-year period. Construct biologically 
meaningful explanatory variates and fit MLR models for fhe increases in heighf 
and basal area. Commenf on fhe fif of your models, examine fhem for any evi- 
dence of misspecificafion, and wrife down and inferpref fhe besf predicfive 
model in each case. Give a 95% Cl for fhe increase in heighf and basal area for a 
plof wifh 100 mg K and 20 mg Ca in a sample of weighf 20 g. 

14.3 A sef of samples were processed fo calibrafe a near infrared reflecfance (NIR) 
insfrumenf for fhe measuremenf of fhe profein confenf of ground wheaf (Fearn, 
1983). File ground.dat holds fhe sample number (Sample) and profein confenf 
(variafe Protein, %) esfablished by a sfandard mefhod and measuremenfs of fhe 
reflecfance af six wavelengfhs (variafes L1-L6). Can you find a sfable MLR fo 
predicf profein confenf? Wrife down and inferpref your final predicfive model. 

14.4 A foresf biomass sfudy combined dafa generafed over several years from dif- 
ferenf research f rials. In each sfudy, several inventory plofs were esfablished 
and 2-12 slash pine frees from fhese plofs were felled and componenfs of bio- 
mass measured. The combined dafa sef confains 174 frees from 50 invenfory 
plofs wifh a wide range of ages and sizes. For each free, fhe fofal aerial biomass 
(TAB, kg), diamefer af breasf heighf (DBH, cm) and fofal heighf (Hf, m) were 
measured. Sfand level variables were also obfained from each plof, including 
basal area (BA, m^/ha), fofal number of frees (N, frees/ha), quadrafic diamefer 
(QD, cm) and sfand age (Age, years). The objecfive of fhe sfudy is fo consfrucf 
a model fhaf predicfs fofal aerial biomass using the tree and stand variables. 
File SLASH.DAT Contains unit numbers (ID) with plot numbers (Plot) and the 
explanatory variates (JAB, Ht, DBH, BA, N, QD, Age). The response TAB is usu- 
ally log-transformed fo obfain homogeneous variances. The generic model 
suggesfed in the literature takes the form 

log,(TAB,) = a + Pi log,(DBH,) + log,(Hf,) + P 3 log,(BA,) + p^ log,(N,) 

+ Ps loge(QD,) + Ps loge(Aye,) + e,- , 

i.e. a MLR wifh six explanatory variafes, wifh bofh fhe response and explana- 
tory variafes log-franstormed. Flowever, fhis sef of explanafory variafes offen 
shows sfrong mulficollinearify. Fif fhe MLR model described above, and crifi- 
cally evaluate if (e.g. invesfigafe fhe collinearify, invesfigafe misspecificafion 
using parfial residual plofs, plof fhe observed dafa againsf fhe fiffed values). 
Flow robusf is fhis model? Can you suggesf a better model? (We re-visif fhese 
dafa in Exercise 16.6.)* 

14.5 Exercises 12.4 and 13.5 used SLR fo predicf dry maffer (variafe DryMatter) in 
terms of one of four explanafory variafes: MaxLength (lengfh of fhe longesf 
stem), SumLength (sum of lengfhs of all sfems), SumDiam (sum of diameters 
of all sfems) and LengthTopS (average lengfh of fhe five longesf sfems). These 
variables are held in file willowstems.dat. Invesfigafe whefher you can obfain 
beffer predicfions of dry maffer from a MLR model and check fhe fif of any 
candidate models. If you have more fhan one candidate model, compare fheir 
fif using cross-validafion (as in Exercise 13.5b). 



Data from FBRC, University of Florida. 
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14.6 In Example 14.2, a MLR model was found for a sef of 50 fraps from fhe EXAMINE 

projecf in 1995 (dafa file examine.dat). 

a. Use residual plofs fo check fhe fif of fhis model (hinf: look af fhe fiffed values 
plof, a plof of fhe fiffed versus observed values. Cook's sfafisfics and parfial 
residual plofs). Is fhere any cause for concern? 

b. Dafa from fhe same sfudy in 1996 are held (in fhe same formaf) in file 
EXAMiNE 96 .DAT. Evaluafe fhe fhree candidafe models found in Example 14.2 
(i.e. fhe models from forward selecfion, backward selecfion and fhe final 
predicfive model) by using cross-validafion on fhe 1996 dafa. Which model 
performs besf? 

c. Perform model selecfion on fhe 1996 dafa fo esfablish some candidafe mod- 
els. Check fheir fif using residual plofs, as in parf (a), and evaluafe fhem by 
cross-validafion on fhe 1995 dafa. Can you draw any conclusions by com- 
paring fhe selecfed models across fhe 2 years? 

d. Repeaf fhe model selecfion exercise using fhe combined dafa sef (held in 
file EXAMiNE 9596 .DAT) and check fhe fif of candidafe models using resid- 
ual plofs. Wrife down your final predicfive model. How much confidence 
would you have in predicfing for ofher years? 

e. Describe fhe sfrucfure of fhe combined dafa sef. Can you fake accounf of fhis 
sfrucfure in regression analysis? (We explore fhis furfher in Exercise 16.7.) 




Models for Variates and Factors 



In the previous chapters, we developed models for one or more qualitative explanatory 
variables (factors; Chapters 4 to 11) and models for one or more quanfifafive explanafory 
variables (variafes; Chapfers 12 fo 14). We now infroduce models for a combinafion of 
qualifafive and quanfifafive explanafory variables, i.e. one or more factors wifh one or 
more variafes. Models for variafes and factors arise in many sifuafions, buf fhey are simple 
exfensions of fhe models discussed previously. We can fhink of fhem as eifher adding 
a variafe fo a model for facfors, or vice versa. As an example of fhe firsf, consider a field 
frial sef up as a CRD fo sfudy fhe effecf of differenf fypes of ferfilizers, where fhe linear 
model consisfs of a single factor fo idenfify fhe response of each freafmenf group (ferfilizer 
fype). If differences in planf size befween plofs had been noficed (and measured) before 
fhe ferfilizers were applied, fhe single factor model could be improved by incorporafing an 
explanafory variafe fo quanfify, and hence enable a correcfion for, fhe effecf of inifial planf 
size on final yield. This extension is known as analysis of covariance (ANCOVA), where 
an explanafory variafe is used fo accounf for underlying differences befween experimenfal 
unifs. In fhe second case, we wish fo incorporate intormafion on groups info simple (or 
mulfiple) linear regression. The groups may arise from fhe applicafion of differenf freaf- 
menfs fo fhe experimenfal unifs (e.g. differenf variefies, or levels of wafer sfress) or due fo 
observed differences befween experimenfal unifs (e.g. males and females of a species, or 
differenf soil fypes). Each group mighf exhibif a unique paffern of response, so fhe pur- 
pose of analysis is fo invesfigafe fhe differences, which mighf require separafe infercepf or 
slope paramefers (or bofh) for each group. This process is offen known as regression with 
groups or parallel model analysis. 

In this chapter, we first focus on regression wifh groups for fhe case of a single factor and 
one explanafory variafe. We sfarf wifh an overview of fhe mosf common models (Secfion 
15.1.1), and fhen give a defailed explanafion of each model and fhe sequenfial analysis of 
variance (ANOVA) used for model selecfion (Secfion 15.1.2). We consider some variafions, 
such as building fhe model from differenf sequences of sub-models (Secfion 15.1.3) or 
imposing consfrainfs on fhe infercepf paramefers (Secfion 15.1.4). There are several ways fo 
exfend fhe model, and nexf we allow mulfiple variafes, i.e. multiple linear regression with 
groups (Section 15.2). We then discuss regression with groups as a method for modelling 
linear frends wifhin a sfrucfured designed experimenf (Secfion 15.3) and explore fhe rela- 
fionship befween ANCOVA and regression wifh groups (Secfion 15.4). We can define more 
complex models, including bofh mulfiple facfors and mulfiple variafes, and in Secfion 15.5, 
we discuss fhe issues fhaf fhen arise in model selecfion and predicfion. Finally, we nofe 
fhaf filling a factor in a model is in facf equivalenf fo filling a sef of specially defined 
explanafory variafes, called dummy variafes, and fhis equivalence is explained in Secfion 
15.6. Using fhis represenfafion, we can wrife fhe model in mafrix formal, as used in mafh- 
emafical sfafisfical fexfs (Secfion 15.6.1). 



381 



382 



Statistical Methods in Biology 



15.1 Incorporating Groups into the Simple Linear Regression Model 

The aim of simple linear regression with groups (SLR with groups) is to investigate whether 
the straight line relationship between the response and a single explanatory variate changes 
from group to group. This is done by a combined analysis of the whole data set which, as 
usual, aims to find the simplest model that accounts for the patterns observed. As in the 
previous chapters, ANOVA is used as the tool for model selection. As a motivating example, 
consider a controlled environment trial where a range of doses of fungal inoculum are 
applied to a set of plants from several varieties under conditions known to be conducive 
to infection. The aim of the experiment is to see how the number of lesions present after 1 
week is related to the dose and whether this relationship changes across varieties, which 
may give a measure of variety resistance. If we can assume that the number of lesions 
increases linearly with the dose, then the most complex model for this experiment should 
allow separate lines (i.e. separate intercepts and separate slopes) for each variety (group). 
Observations consistent with this model are illustrated in Figure 15.1a. If ANOVA indicates 
no differences between the slopes of the regression lines across groups, then the model can 
be simplified, giving a set of parallel lines (with separate intercepts and a common slope); 
observations of this type are shown in Figure 15.1b. If there is no statistically significant 
difference in the average response levels, as indicated by the intercepts, then the model can 
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FIGURE 15.1 

Data sets with three groups (•,",^) showing fitted lines ( — ) when the required model is (a) separate lines, (b) 
parallel lines, (c) single line and (d) null. 
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be simplified further, giving a single common line across groups, i.e. the SLR model (see 
Figure 15.1c). In this case, it is sensible to check for evidence of any association between the 
response and explanatory variate, and if there is not, then the response is best predicted by 
a constant value, as for the observations shown in Figure 15.1d. 

15.1.1 An Overview of Possible Models 

We start with an overview of the models outlined above, using a simple parameterization. 
As previously, we obtain least squares estimates for the parameters in the models. We do 
not derive these estimates here, leaving the calculations to statistical software. To define 
the models, we extend the notation for the SLR model to take account of the presence of 
groups. For convenience, we label the observations by the group to which they belong and 
then number the observations within each group. We therefore use to represent the 
kth observed response in the jth group, with the corresponding value of the explanatory 
variable denoted The number of groups is denoted t, and so the index j runs from 1 to 
t groups (/ = 1 ... t) and the number of observations in the jth group is denoted so the 
index k rims from 1 to (A: = 1 ... nj). The total sample size, denoted N as previously, is the 
sum of the number of observations in each group, N = Wj + + . . . + 

The most complex model for SLR with groups, which we call the separate lines model, 
allows a separate intercept and a separate slope for each of the t groups, and is written 
most simply as 



yjk=<^j + ^fy + ejk, ( 15 . 1 ) 

where the parameters and (3^ represent the intercept and slope of the regression line for 
the yth group, and Cji^ represents the random deviation for the kth observation in the ;th 
group. The assumptions associated with the linear model listed in Sections 4.1 and 12.1 
also apply here. In symbolic form, this model is written as 

Explanatory component: grp + x.grp 

where x holds the values of the explanatory variate and grp is a factor indicating the allo- 
cation of observations to groups. The grp term is associated with the individual group 
intercepts (a^). The composite term containing the variate and factor, x.grp, fits a separate 
slope for each group (P^). 

EXAMPLE 15.1A: STAND DENSITY OE MIXED NOTHOEAGUS EOREST PLOTS 

A survey was done of 41 plots containing natural stands of pure or mixed Nothofagus 
forest at the foot of the Andes. The resulting data can be found in file forest.dat and are 
displayed in Table 15.1. The stands were classified into three types defined by the domi- 
nant species within the stand (factor Type) which was Coigue (type 1 with 13 plots), 

Rauli (type 2 with 9 plots) or Roble (type 3 with 19 plots). The variables recorded for 
each plot were the number of trees per hectare (stand density, variate SD) and the mean 
quadratic diameter in cm (variate QD). 

The objective of the study was to model stand density as a function of quadratic diam- 
eter and to compare this relationship among the three types of stand. The usual model 
fitted to such data is a SLR model with both variables transformed to natural loga- 
rithms, and we follow this convention here. A scatter plot of the transformed variables 
(Figure 15.2) shows a negative relationship, with smaller log stand density correspond- 
ing to larger values of log quadratic diameter. On this log-log scale, the relationship 



384 



Statistical Methods in Biology 



TABLE 15.1 

Stand Density (Variate SD) and Mean Quadratic Diameter (Variate QD, cm) for 41 Plots of 
Mixed Nothofagiis Forest Classified into Three Stand Types (Factor Type) According to the 
Dominant Species (Example 15.1A and File forest.dat) 



Type 


SD 


QD 


Type 


SD 


QD 


Type 


SD 


QD 


Coigue 


1780 


22.11 


Rauli 


2970 


13.02 


Roble 


3440 


11.60 


Coigue 


980 


30.50 


Rauli 


1500 


14.84 


Roble 


1600 


13.17 


Coigue 


3100 


16.98 


Rauli 


4080 


15.02 


Roble 


3100 


9.48 


Coigue 


4120 


12.69 


Rauli 


1600 


15.44 


Roble 


1420 


17.85 


Coigue 


2280 


17.92 


Rauli 


2040 


18.66 


Roble 


2060 


15.85 


Coigue 


4760 


15.19 


Rauli 


1960 


18.02 


Roble 


2440 


14.54 


Coigue 


4960 


12.00 


Rauli 


2120 


15.20 


Roble 


1720 


16.20 


Coigue 


1520 


19.51 


Rauli 


2160 


19.60 


Roble 


1220 


18.27 


Coigue 


1480 


21.39 


Rauli 


2720 


11.53 


Roble 


4080 


8.88 


Coigue 


5560 


10.87 


Roble 


3890 


11.95 


Roble 


3440 


11.65 


Coigue 


2000 


23.94 


Roble 


1070 


22.74 


Roble 


760 


26.31 


Coigue 


2960 


14.21 


Roble 


1720 


14.41 


Roble 


3840 


12.04 


Coigue 


3240 


19.67 


Roble 


2920 


12.58 


Roble 


1600 


14.38 








Roble 


2960 


11.64 


Roble 


2320 


12.60 



Source: Data from Dra. Alicia Ortega Z., Universidad Austral de Chile. 



is reasonably linear both overall and within groups, although the range of quadratic 
diameter values for Rauli plots (group 2) is much smaller than for fhe other groups. 

The separate lines model can be written as 

logSDji^ = a, + (3; logQDj^ + , 

where logSD^f. and logQDjj. are the natural logarithms of the stand density and mean 
quadratic diameter in the fcth plot of the jth stand type, respectively (; = 1 ... 3, A: = 1 . . . Wy, 
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FIGURE 15.2 

Logged stand density plotted against logged quadratic diameter (cm) for 41 plots classified by the dominant 
species (• Coigue/group 1; ■ Rauli/group 2; A Roble/group 3) (Example 15.1 A). 



Models for Variates and Factors 



385 



for Wj = 13, «2 = 9, «3 = 19), ay and Py are the intercept and slope for the /th stand type, and 
are the random deviations. In symbolic form, this model is written as 

Response variable: logSD 

Explanatory component: Type + logQD.Type 

where logSD = logfSD) and logQD = logfQD) are the log^-transformed variates. The 
explanatory terms Type and logQDJype are associated with the separate intercepts and 
slopes for each stand type, respectively. The fitted model (adjusted = 0.734) is shown 
in Figure 15.3a, and the parameter estimates are given with their standard errors in 
Table 15.2. 

The parallel lines model allows a separate intercept for each group, but imposes a com- 
mon slope so that the fitted model consists of a sef of parallel lines. This model can be 
writfen in ifs simplesf form as 



y,fc=a; + P^;i + ey7c/ 



( 15 . 2 ) 






(d) 




2.4 2.6 2.8 3.0 3.2 3.4 
loge(QD) 



FIGURE 15.3 

Logged stand density (SD) plotted against logged quadratic diameter (QD, cm): • Coigue; ■ Rauli; A Roble with 
fitted lines ( — ) generated by: (a) separate lines, (b) parallel lines, (c) single line and (d) null models (Example 
15.1A). 
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TABLE 15.2 



Models for Logged Stand Density in Terms of Stand Type (Factor Type) and Explanatory Variate 
logj(Quadratic Diameter) (Variate logQD) (Examples 15.1A to C) 



Term 


Parameter 


Separate Lines Model 
(Example 15.1A) 


Parallel Lines Model 
(Example 15.1B) 


Single Line Model 
(Example 15.10 


Estimate 


SE 


Estimate 


SE 


Estimate 


SE 


[1] 


a 


— 


— 


— 


— 


11.115 


0.5174 


Type 1 (Coigue) 




12.534 


0.6734 


12.270 


0.4393 


— 


— 


Type 2 (Rauli) 


«2 


9.497 


1.3727 


11.926 


0.4236 


— 


— 


Type 3 (Roble) 


«3 


11.949 


0.5536 


11.735 


0.4041 


— 


— 


logQD 


P 


— 


— 


-1.536 


0.1516 


-1.232 


0.1884 


/ogQD.Type 1 


Pi 


-1.628 


0.2341 


— 


— 


— 


— 


/ogQD.Type 2 


P2 


-0.650 


0.4999 


— 


— 


— 


— 


/ogQD.Type 3 


P3 


-1.617 


0.2087 


— 


— 


— 


— 



SE = standard error. 



where the parameter still represents the intercept of the regression line for the/th group, 
i = 1 ... t, and parameter [3 now represents the common slope of the parallel lines. In sym- 
bolic form, this model is written as 

Explanatory component: grp + x 

Here, the term grp is again associated with the individual group intercepts (tty), and the 
term x is associated with the common slope parameter ((3). 



EXAMPLE 15.1B: STAND DENSITY OF MIXED NOTHOFAGUS FOREST PLOTS 

The parallel lines model for logged stand density is written as 

logSDj^ = a, + (3 logQD ji, + . 

Parameter Oy is the intercept for the jth stand type, and |3 is the common slope of the 
decrease for logged mean quadratic diameter. In symbolic form, the explanatory model 
is written as 

Explanatory component: Type + logQD 

The Type term is associated with the separate intercepts for each stand type, and the 
term logQD is associated with the common slope. The fitted model (adjusted M = 0.723) 
is shown in Figure 15.3b and the parameter estimates are given in Table 15.2. For this 
data set, the parallel lines model appears similar to the separate lines model, particu- 
larly for stands with Roble and Coigue as the dominant species. 

The single line model does not allow for any difference between groups and is just a 
SLR model, written as 



yj^=a+^Xjj, + ej^, 



(15.3) 



Models for Variates and Factors 



387 



where the parameters a and (3 now represent the intercept and slope, respectively, of the 
common regression line. In symbolic form, fhis explanafory model is wriffen as 

Explanatory componenf: [1] + x 

Recall from Chapfer 12 fhaf [1] represenfs a variafe of lengfh N fhaf fakes value 1 every- 
where and is associated wifh fhe infercept paramefer, a. As above, ferm x is associafed 
wifh fhe common slope paramefer, (3. 

EXAMPLE 15.1C: STAND DENSITY OE MIXED NOTHOEAGUS EOREST PLOTS 

The single line model for logged stand density is written as 

logSDj^ = a + (3 logQDj^ + . 

Parameter a is the intercept and (3 is the slope of the common line. In symbolic form, the 
explanatory model is written as 

Explanatory component: [1] + logQD 

The fitted model (adjusted M = 0.511) is shown in Figure 15.3c and the parameter esti- 
mates are given in Table 15.2. For this data set, this model appears inappropriate, as 
most of the observations for Coigue-type stands appear above the fitted line, and most 
of those for Roble-type stands appear below the fitted line. 

The null model does not allow for any difference between groups or for any relafionship 
wifh the explanatory variate and is written as 

yjk = c. + ejk, 

where parameter a now represents the overall population mean. In symbolic form, fhis 
model is written as a single ferm 

Explanatory component: [1] 

where the term [1] is associated with the parameter a. 

EXAMPLE 15.1D: STAND DENSITY OF MIXED NOTHOEAGUS FOREST PLOTS 

The null model is written as 



logSDji, = a + ej^. 

Parameter a now represents the population mean logged stand density. The symbolic 
form was shown above. The fitted model, with adjusted M = 0 and d = 7.750 (SE 0.0731) 
is shown in Figure 15.3d and is clearly inappropriate for this data set as it does not cap- 
ture the clear negative correlation between the logged number of trees and the logged 
mean quadratic diameter. 

We use ANOVA to determine the appropriate model for any dafa sef objecfively, by 
fifting fhe models described above in order of increasing complexify, i.e. fhe single line. 
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parallel lines and separate lines models. There is no need to fit the null model as it is auto- 
matically used as the baseline for comparisons. This sequence of models can be filled by 
progressive addifion of ferms info fhe model, buf fhaf procedure leads fo a more compli- 
cafed parameferizafion fhan used in Ibis secfion. However, fhis parameferizafion is fhe 
defaulf in mosf sfafisfical soffware and so we explain if in some defail in fhe nexf secfion. 



15.1.2 Defining and Choosing between the Models 

We now look at the single line, parallel lines and separate lines models in more detail, using 
a more standard parameterization, and use the resulting sequential ANOVA to determine 
the most appropriate model for a given data set. As in the previous chapters (see Sections 

11.2 and 14.4), each model is quantified in terms of its model sum of squares, ModSS, and 
df, ModDF. Recall that the model sum of squares (SS) is the sum of squared differences 
between the fitted values from the current model and those from the baseline (null) model. 
The model df is equal to the number of independent parameters required to fit the model 
minus one, where the adjustment accounts for the single parameter in the baseline model. 
A sequential ANOVA table is constructed from the incremental sums of squares and df 
derived from these model sums of squares and df (details below), and we can use F-tests 
from this ANOVA table to find an appropriate model for the observed responses. 



15.1.2.1 Single Line Model 

As described above, the single line model is a SLR model that ignores groups and takes the 
form shown in Equation 15.3, with symbolic form 

Explanatory component: [1] +x 

As in Section 14.4, we represent the SS for this model as ModSS(/17-i-x) with 
ModDE([7i-i-x) = l. 

EXAMPLE 15.1E: STAND DENSITY OE MIXED NOTHOEAGUS EOREST PLOTS 

For the stand density data, the parameter estimates are in Table 15.2 and the fitted model 
appears in Figure 15.3c. The model SS is equal to 4.583 with 1 df. 



15. 1.2.2 Parallel Lines Model 

The parallel lines model represents the case where groups are constrained to have the 
same slope but allowed to have different intercepts. One simple form of this model is 
shown in Equation 15.2. However, it can also be considered as an extension of the single 
line model obtained by addition of the group factor into that model. Hence, the parallel 
lines model can also be written in symbolic form as 

Explanatory component: [1] + x + grp 

The variate [1] is still associated with a common intercept, and factor grp introduces a 
separate intercept for each group. In mathematical form, this model is written as 



yj^ = a + ^Xj^ + Vi + ej;,, 
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where (3 is still the common slope of the regression lines across groups, a can be thought 
of as an overall infercepf, and fhe paramefers Vy (; = 1 ... t) can be fhoughf of as group- 
specific deviafions from fhe overall infercepf. Unforfunafely, fhis model is now over- 
parameferized as we have only t groups, buf have f -i- 1 paramefers fhaf defermine fhe 
infercepfs of fhose t groups. Some form of consfrainf on fhe paramefer esfimafes is fhere- 
fore required. One possibilify (used in Secfion 15.1.1) is fhe omission of fhe overall infer- 
cepf ferm. The ofher possibilify is fo impose a consfrainf on fhe sef of group infercepfs, 
Vy (/' = 1 ... t) and fhis is fhe choice usually made wifhin sfafisfical soffware. Here, we use 
firsf-level-zero consfrainfs, which sef Vj = 0 and were previously used for facfor models 
in Secfions 4.5, 8.2.6 and 11.2.1. This parameferizafion changes fhe inferprefafion of fhe 
infercepf and so we relabel fhaf paramefer. Some soffware packages, including SAS, use 
lasf-level-zero consfrainfs which follow similar underlying principles buf sef V( = 0 (see 
Secfion 4.5). Grouping fhe infercepf ferms fogefher and relabelling paramefers produces 
fhe model 



y;Tc = (ai + Vy) + px,vt + e,r- 

The infercepf for fhe ;fh group is now -i- Vy. Here, is sfill associafed wifh a variafe 
wifh value 1 everywhere (wriffen symbolically as [1]), buf because of fhe consfrainf 
Vj = 0, paramefer aj is now equal in value fo fhe infercepf for fhe firsf group. The param- 
efer Vy is fhe difference befween fhe infercepfs for fhe jth and firsf groups (associafed 
wifh fhe grouping facfor, grp). The model SS is wriffen as ModSS(f7J -i- x -i- grp), and 
fhis model esfimafes t infercepfs and one slope paramefer, hence ModDF(/'7J -i-x -i- grp) = 
(f -I- 1) - 1 = f. 



EXAMPLE 15.1E: STAND DENSITY OE MIXED NOTHOEAGUS EOREST PLOTS 

Using first-level-zero parameterization, the parallel lines model for logged stand den- 
sity is written as 



logSDj^ = (Ui + Vy) + pZoyQDyt + , 

with the explanatory component of the model written in symbolic form as 
Explanatory component: [1] + logQD + Type 

Parameter estimates for this model are P = -1.536 (SE 0.1516), di = 12.270 (SE 0.4393), 
V 2 = -0.344 (SE 0.1083) and V 3 = -0.536 (SE 0.0948), with Vi fixed equal to zero. The fit- 
ted models for the three stand types are therefore 

Coigue (group 1): logSD^j^ = (di + vi )+ PtoyQDu = 12.270 - 1.5361o^QDit 
Rauli(group2): logSD^^ = (&i + V 2 )+ ^logQD 2 k = 11.926 - 1.536logQD2t 

Roble(group3): logSD^^ = (Si + V3)+ ^logQDsi^ = 11.735 - 1.536logQDsk 

This results in the same group intercepts as obtained in Example 15.1B and Table 15.2. 
The fitted parallel lines are shown with the data in Figure 15.3b, and the model SS is 
6.524 with 3 df. 
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15.1.2.3 Separate Lines Model 

This model allows separate intercepts and separate slopes for each of the t groups and was 
shown in simple form in Equation 15.1. This model can be considered as an extension of the 
parallel lines model, obtained by addition of a combined term formed from the explanatory 
variate and the groups factor. In symbolic form, fhe separate lines model is then written as 

Explanatory component: /'7J + x + grp + x.grp 

Here, the term x is still associated with a component of slope held in common across 
groups, and the added term x.grp specifies that a different slope for variate x is to be fitted 
for each of fhe t groups. In mafhemafical form, this model is written in full as 

y jk = OC + P^;/C + V/ + f\jXjk + Cjk / 

where paramefers are defined as above, buf wifh fhe infroducfion of r\j (;' = 1 . . . f) as group- 
specific deviations from the common slope, p. This model is now over-parameterized in 
terms of both the intercepts and the slopes, as we still have only t groups, but f -i- 1 param- 
eters that determine the intercepts and f -i- 1 parameters that determine the slopes for those 
t groups. So, now we implement the first-level-zero parameterization for both intercepts 
and slopes. Grouping the intercept terms and the slope terms together, and again slightly 
relabelling the parameters produces the model 

Vjk = (“i + V;) + (Pi + Bi)Xjk + ejk , 

with constraints Vj = 0 and Pi = 0. As above, the intercept for the ;th group is ttj -i- y, and 
the slope for the ;th group is now |3j -i- Here, Pj is still associated with the explanatory 
variate (x), but because of fhe constraint = 0, Pi is now equal in value to the slope for the 
first group. The parameter q^ (associated with term x.grp) equals the difference befween 
the slopes for fhe/fh and first groups. The model SS is written as ModSS(/7j -i- x -i- grp -i- x.grp), 
and this model estimates t intercept and t slope parameters, with ModDE(/'7j -i- x -i- grp -i- 
x.grp) = 2f - 1. 

EXAMPLE 15.1G: STAND DENSITY OE MIXED NOTHOEAGUS EOREST PLOTS 

Using first-level-zero parameterization, we write the separate lines model for logged 
stand density as 



logSDjk =(ai + Vy)+ (pi + T]j)logQDjt + , 



with the explanatory component of the model written in symbolic form as 

Explanatory component: [1] + logQD + Type + logQDJype 

Parameter estimates for this model are shown in Table 15.3, from which we can derive 
the fitted model for each stand type as 



Coigue(groupl): logSD^i^ = (di + Vi)+ (Pj+ f|i)xn: = 12.534 - 1.628xit , 
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TABLE 15.3 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed Significance 
Levels (P) for a Separate Lines Model for Logged Sfand Density in Terms of Eactor Type 
(1 = Coigue, 2 = Rauli, 3 = Roble) and Explanatory Variate logQD (Example 15.1G) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 




12.534 


0.6734 


18.615 


< 0.001 


logQD 


Pi 


-1.628 


0.2341 


-6.955 


< 0.001 


Type 1 


Vi 


0.000 


— 


— 


— 


Type 2 


V2 


-3.037 


1.5290 


-1.986 


0.055 


Type 3 


V 3 


-0.586 


0.8717 


-0.672 


0.506 


/ogQD.Type 1 


Hi 


0.000 


— 


— 


— 


/ogQD.Type 2 


52 


0.978 


0.5520 


1.772 


0.085 


/ogQD.Type 3 


5s 


0.011 


0.3136 


0.036 


0.972 



Rauli(group2): logSD^^ = (&i + V 2 )+ (Pi+ r| 2 )x 2 fe = 9.497 - 0.650x2*: , 

Roble (group 3): logSD^i, = (&i + V3)+ (Pi+ tl 3 )x 3 j: = 11.949 - 1.617x3^ . 

Again, these results match the group intercepts and slopes obtained in Example 15.1A 
and Table 15.2. The fitted separate lines are shown with the data in Eigure 15.3a, and the 
model SS is 6.725 with 5 df. 

15.1.2.4 Choosing between the Models: The Sequential ANOVA Table 

We now have all of the ingredients required to build the sequential ANOVA table for this 
set of models, starting with the single line model, moving to the intermediate parallel 
lines model and then to the more complex separate lines model. Recall from Sections 11.2 
and 14.4 that the incremental sums of squares and df in the sequential ANOVA table are 
calculated from the increase in the model SS and df as terms are added into the model. We 
calculate the incremental SS and df for the single line model by taking differences with the 
null, or baseline model, containing only the intercept term [1], which has ModSS{[1]) = 0 
and ModDF(/’7j) = 0. Hence 

SS(x|/7J) = ModSS{[1] + X) - ModSS(/77) = ModSS(/‘7j + x) 

DF(x \[1]) = ModUF{[1] + X) - ModUF{[1]) = ModUF{[1] + x) = 1 
Similarly, on moving from the single line model to the parallel lines model, we obtain 
SS{grp\[1] + x) = ModSS(/'7] + x + grp) - ModSS{[1] + x) 

DF(grp|/'7J + x) = ModDF(/'7] + x + grp) - ModDF(/'7j + x) = t - 1 
Finally, moving onto the separate lines model, we obtain 

SS(x.grp|/'7J + x + grp) = ModSS(/'7j + x + grp +x.grp) -ModSS(/7] + x + grp) 
DF(x.grp|/'7] + x + grp) = ModDF(/‘7j + x + grp +x.grp) - ModDF(/7] + x + grp) = t - 1 
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Recall also that within a sequential ANOVA table, we can abbreviate the incremental SS as 
SS(+term) to indicate that the term has been added into a model that already contains all of 
the terms in previous lines of fhe fable. We follow a similar convenfion for fhe incremenfal 
DF. For fhis sequence of models, fhe incremenfal SS and DF can fherefore be written as 

SS(+xj = SS(x I [1]), DF(+x; = 1 

SS(+grp) = SS{grp\[1] + x), DF(+grp) = f - 1 

SS(+x.grp) = SS(x.grp|/7] + x + grp), DF(+x.grp) = f - 1 

We calculafe mean squares as usual, by division of fhe incremenfal sums of squares by 
fheir df, and variance rafios by division of fhe mean squares by fhe residual mean square, 
ResMS, from fhe separafe lines model. The sequenfial ANOVA fable fakes fhe form in 
Table 15.4. 

As in SLR and MLR models, we should check for model misspecificaf ion before drawing 
conclusions. The residual plofs described in Chapfers 5 and 13 can be used for fhis check, 
and fransformafions may be used fo sfabilize variances if required (see Chapfer 6). If may 
help fo plof a separafe graph for each group, especially when fhere are many observafions 
wifhin each group. If fhere is no suggesfion of model misspecificafion, fhen we can pro- 
ceed fo use fhe sequenfial ANOVA fo idenfify a parsimonious predicfive model, i.e. fhe 
simplesf model fhaf describes fhe dafa well. As in Chapfers 8 and 11, our full model is well 
defined, and so we sfarf wifh fhe mosf complex model and progressively fry fo simplify if. 

The variance rafio F'‘-s''p is associafed wifh adding fhe final ferm x.grp info fhe parallel 
lines model fo obfain fhe separafe lines model. This fesfs fhe null hypofhesis fhaf fhe sepa- 
rafe lines model gives no sfafisfical improvemenf over fhe parallel lines model, i.e. FIq: = 0 

for y = 1 ... t, againsf fhe general alfemafive fhaf fhis is nof fhe case. If fhe null h 5 q)ofhesis is 
frue, fhe variance rafio F^-9 ''p has an F-disfribufion wifh t-1 numerator and N -2t denomi- 
nafor df. If F''-9 ''p is larger fhan fhe 100(1 - ajfh percentile of fhis disfribufion, we rejecf fhis 
null h 5 q)ofhesis (af significance level aj, conclude fhaf we cannof simplify fhe separafe lines 
model, and use fhis as our predicfive model. In fhis case, fhe t separafe slopes and t separafe 
rnfercepfs should be reported wifh fheir sfandard errors. If F^-p'^p is nof significanf, we move 
on fo investigate fhe parallel lines model, using variance rafio F^'^p. 

The variance rafio F^'^p is associafed wifh addifion of fhe factor ferm grp info fhe single 
line model fo obfain fhe parallel lines model. This fesfs fhe null hypofhesis fhaf fhe paral- 
lel lines model gives no sfafisfical improvemenf over fhe single line (i.e. SLR) model, i.e. 



TABLE 15.4 

Form of Sequential ANOVA Table for Regression with Groups Using Factor grp 
and Explanatory Variate x 



Term Added 


Incremental 

df 


Incremental 

SS 


Mean 

Square 


Variance Ratio 


+ x 


1 


SS(+x) 


MS(+x) 


F" = MS(+x)/ResMS 


+ grp 


t-1 


SS(+grp) 


MS(+grp) 


pgrp = MS(+grp)/ ResMS 


+ x.grp 


t-1 


SS(+x.grp) 


MS(+x.grp) 


F^grp = MS(+x.grp)/ResMS 


Residual 


N-2t 


ResSS 


ResMS 




Total 


N-1 


TotSS 







SS = sum of squares. 
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H[,: Vy = 0 for j = 1 ... t, against the general alternative that this is not the case. If the null 
h 5 ^othesis is true, the variance ratio F^'^p also has an F-distribution with t - 1 numerator 
and N-2t denominator df . If Fs'^p is larger than the 100(1 - ajth percentile of this distribu- 
tion, we reject this null h 5 ^othesis and conclude that we carmot simplify the parallel lines 
model. In this case, the common slope and t separate intercepts should be reported with 
their standard errors. If Fp''p is not significant, we move on to investigate the single line 
model, using variance ratio F^. 

The variance ratio Fh associated with addition of the variate x into the null model to 
obtain the SLR model, is the test statistic for the slope in a SLR model (Section 12.3). If F" is 
significant, then we conclude that the single line model adequately represents the data set, 
and the common slope and intercept should be reported with their standard errors. If F^ is 
not significant then we conclude there is no linear relationship between the response and 
the explanatory variate. 



EXAMPLE 15.1H: STAND DENSITY OF MIXED NOTHOFAGUS FOREST PLOTS 

The model SS and df from Examples 15.1E to G give the sequential ANOVA table in 
Table 15.5. We have already verified (Figure 15.2) that the relationship between the 
response and explanatory variate is approximately a straight line within each group 
and that there is no sign of model misspecification. A composite set of residual plots 
from the separate lines model is shown in Figure 15.4, with points labelled by groups, 
and shows no real cause for concern. We judge that the pattern in the absolute residual 
plot reflects the sparsity of observations for small or large fitted values rather than vari- 
ance heterogeneity. 

We therefore proceed to interpret the ANOVA table to establish our predictive model. 
The variance ratio for separate lines, F'°° ’' = 1.724 with 2 and 35 df (P = 0.193), gives no 
evidence that the slopes differ between types of plot. We therefore examine variance 
ratio F^ to test the null hypothesis that the intercepts are all equal. Here, F^ = 16.629 
on 2 and 35 df (P < 0.001) giving strong evidence that separate intercepts are required, 
so we use the parallel lines model to describe the relationship between logged stand 
density and logged quadratic diameter. The equations of the fitted parallel lines were 
given in Example 15.1F. Logged stand density is smaller for larger values of logged 
quadratic diameter, and decreases at the same rate for all three stand types. For a given 
value of quadratic diameter, logged stand density for Rauli and Roble type stands is 
on average 0.344 and 0.536 units less, respectively, than for Coigue type stands. Figure 
15.3b showed the data with the fitted parallel lines superimposed; this model appears 
to describe well the response within the observed range of logged quadratic diameter. 
In Exercise 15.4, we ask you to interpret this model on the original scale. 



TABLE 15.5 

Sequential ANOVA Table for Regression with Groups for Logged Stand Density 
with Factor Type and Variate logQD (Example 15.1H) 



Term Added 


Incremental 

df 


Incremental 

SS 


Mean 

Square 


Variance 

Ratio 


P 


+ logQD 


1 


4.5833 


4.5833 


F'°o = 78.562 


< 0.001 


+ Type 


2 


1.9403 


0.9701 


FT = 16.629 


< 0.001 


+ /ogQD.Type 


2 


0.2011 


0.1006 


F/qd.t = 1 724 


0.193 


Residual 


35 


2.0419 


0.0583 






Total 


40 


8.7667 









SS = sum of squares. 
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FIGURE 15.4 

Composite set of residual plots based on standardized residuals (• Coigue; i 
lines model for logged stand density (Example 15.1H). 



Rauli; A Roble) from the separate 



In this section, we have outlined the basic procedure for identifying a good model for 
simple linear regression wifh groups. The main complicafion is in the different parameter- 
izations that can be used. It is important to realize that although different parameteriza- 
tions of a model resulf in differenf esfimafes of individual paramefers, fhe fitted model is 
invariant to the parameterization, i.e. the same fitted model, and hence predictions, are 
obtained for any valid parameferizafion. Similarly, fhe sequenfial ANOVA table will have 
the same entries regardless of the model parameterization. 

We have explained the first-level-zero parameterization in some detail because this type 
of parameferizafion is common in sfafistical packages, and is commonly misunderstood. 
Having explained the principles for obfaining fhe fiffed model from the parameter esti- 
mates, the next challenge is in obtaining SEs for fhe amalgamafed estimafes of infercepf 
and slope. Calculafion of the SE of a sum (or difference) of esfimafes requires their covari- 
ances (see Example 15.11 and Section C.4), which might not be presented as standard out- 
put. One way to avoid this issue is to use the simple parameterization of Section 15.1.1 for 
fhe selecfed model, so thaf each intercepf and slope is represenfed by a single paramefer. 
Some models wifh several grouping factors cannot be represented in this manner, but 
most statistical software has facilities to calculate a SE for linear combinafions of param- 
efer esfimafes thaf can be used in this situation. Alternatively, if fhe interesf is more in 
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prediction of the expected response than in the components of the model, it may be more 
informative to produce predictions for specific values of the explanatory variates, and to 
quantify uncertainty with the prediction SEs or confidence intervals (CIs). 

EXAMPLE 15.11: STAND DENSITY OE MIXED NOTHOFAGUS FOREST PLOTS 

Standard errors for the combined estimates of the intercepts in the parallel lines model 
(Example 15.1F) can be derived from the variance-covariance matrix for estimates of 
the unconstrained parameters, which takes the form 





"di' 




' 0.1929 








P 




-0.0658 


0.0230 




Var 


= 










V 2 




-0.0126 


0.0028 


0.0117 




As, 




-0.0193 


0.0051 


0.0053 0.0090, 



For example, the variance of the intercept for Roble stands (group 3) can be expressed 
as 



Var(di + V 3 ) = Var(di) + 2 Cov(di,V 3 ) + Var(v 3 ) 



= 0.1929 + (2 X -0.0193) + 0.0090 
= 0.1633 . 



The estimated SE is then the square root of this value, with SE(di + V3) = 0.4041. A simi- 
lar calculation gives SE(di + V2) = 0.4236, and SE(di + Vi) = SE(di) = 0.4393 since 
Vi = 0, matching the estimated SEs shown in Table 15.2. 

In fitting the SLR with groups, we assume that the model deviations obey the assump- 
tions stated in Sections 4.1 and 12.1. In particular, we assume that the deviations have a 
common variance and this implies that the variation is the same across all groups. You can 
check this assumption graphically by identifying groups, using different colours or sym- 
bols, in residual plots (see Section 5.2). It is not possible to use Bartlett's test (Section 5.3) 
here because the group sample variances are influenced by the values of the explanatory 
variate as well as by background variation. If variances differ between groups, and this 
heterogeneity cannot be corrected by applying a single transformation across all groups, 
then our assumptions no longer apply and conclusions from the ANOVA F-tests may be 
misleading. In this case, a weighted analysis might be appropriate but this is beyond the 
scope of this book (see Draper and Smith, 1998, or Montgomery et al., 2012). 

As long as the residual df are not too small, then it is reasonable to perform model selec- 
tion from the sequential ANOVA table as described above. If the residual df (ResDF) are 
small, then the estimate of background variation from the full model will be poor, giving 
low power for the analysis. In this case, if there is no evidence for the separate lines model, 
it is sensible to drop the x.grp term from the model and refit the parallel lines model to 
obtain a revised sequential ANOVA table. The incremental sums of squares and df for the 
model terms x and grp will be the same as those in the original table, but the combined 
term (x.grp) will now be merged with the residual to obtain a revised estimate of back- 
ground variation. The variance ratios for terms x and grp are then calculated with respect 
to the revised residual mean square and will differ from those in the original table. This 
difference will usually be small if the ResDF in the original table were large (>30) or if the 
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number of groups is small. Both sets of F-tests - from the original and revised tables - are 
valid, although their conclusions may differ. As a (somewhat arbitrary) rule of thumb, we 
suggest constructing the revised table when ResDF < 10. In either case, once a model has 
been selected it is conventional to refit that model and to use its residual mean square as a 
basis for parameter SEs. 

Finally, common sense is required in fitting these models. It is usually reasonable to 
compare behaviour across groups only if the range of the explanatory variate is simi- 
lar across those groups. If there is not a strong overlap, there may be some ambiguity 
between differences in group intercepts and correlation with the explanatory variate. In 
some extreme cases, the sign of the estimated slopes can change depending on whether 
separate group intercepts are included in the model (e.g. see Figure 15.11c). Furthermore, 
comparison across groups is sensible only when there are sufficient observations within 
each group to give confidence in the conclusions. 



15.1.3 An Alternative Sequence of Models 

The sequence of models considered in the previous section started with the single line 
(SLR) model. We then added a factor to allow separate intercepts for each group, i.e. the 
parallel lines model, and finally added the interaction between the factor and variate to 
allow separate slopes for each group. Alternatively, we might start by adding the grouping 
factor into the null model giving a model of the form 

yjk = a + Vj + Cji; . 

This model consists of a set of parallel lines with zero slope. As discussed previously, this 
model is over-parameterized, as it has t -i- 1 parameters to describe only t intercepts and so 
we introduce a constraint. Again, we use the first-level-zero parameterization so the model 
takes the form 



yjk - tti + Vj + Cjit , 

with Vj = 0. As there is no explanatory variate in the model, the 'intercept' parameters can 
here be interpreted in terms of the population means for each group, where the parameter 
Oi represents the population mean for group 1, and the effect y represents the difference 
between the population means for the/th and first groups. This model fits a separate popu- 
lation mean for each group and so is called the separate groups model, with symbolic form 

Explanatory component: [1] + grp 

This is exactly the same model as used for a single explanatory factor in Chapter 4, with 
the model df equal to the number of groups minus one, i.e. t-1. 

Adding an explanatory variate into the separate groups model gets us back to the paral- 
lel lines model with a common slope and separate intercepts. We therefore have two routes 
to this model: we can either add the variate and then the factor into the model or vice versa. 
The combined term is then added to obtain the separate lines model. These two sequences 
of models are illustrated in Eigure 15.5. 

We now have two possible sequential ANOVA tables. Except in the balanced case, where 
observations within each group are made at the same values of the explanatory variate 
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+ grp 




Separate groups 
[1]+grp 

y,k= («i + V, ) + e,. 










Null model 
[1] 

y-l =«+£;» 






\ 

/ 


Parallel lines 
[1]+x+grp 
y,t= («i+ v,)+ Px„+e„ 


> 

+ x.grp 


Separate lines 
/■77+X+ grp + x.grp 
yi=(a, + v,) + (P,+ r|,h,, + e,, 



Single line 
[1]+x 

y,t = a + ej, 



+ grp 



FIGURE 15.5 

Two sequences of models for regression with groups. 

and the two variables are orthogonal, the incremental sums of squares for adding the 
variate (x) and factor (grp) terms will differ between these two tables. The incremental SS 
and F-test for fhe combined term (x.grp) is the same in both sequences. As in Section 14.4, 
the aim of analysis is to identify the simplest model that describes the pattern in the data, 
and we suggest the following procedure. Starting with the separate lines model, we test 
whether simplification to the parallel lines model is permissible. If it is, we refit the model 
if the ResDF are small; otherwise we work from the two original sequential ANOVA tables. 
We then have two choices: we might drop either the grouping factor (to give the single line 
model) or the explanatory variate (to give the separate groups model). We assess the F-tests 
associated with both options. If bofh are significant then we cannot simplify fhe parallel 
lines model. If the F-test for dropping the variate is significant, but that for dropping fhe 
grouping factor is not, then we drop the factor to obtain the single line model, and then test 
whether the variate should be retained in the model. If the F-test for dropping the group- 
ing factor is significant, but that for dropping the variate is not, then we drop the variate to 
obtain the separate groups model, then test whether the groups should be retained in the 
model. If neither test is significant, then we drop the least significant term (i.e. that with the 
largest observed significance level) first, then test the other. 

EXAMPLE 15.1J: STAND DENSITY OE MIXED NOTHOEAGUS EOREST PLOTS 

For the stand composition data, the sequential ANOVA table obtained by addition of the 
Type factor into the model first is Table 15.6. 

In Example 15.1H, we established that we did not require separate slopes for each 
group and therefore we start from the parallel lines model. The incremental F-test for 

TABLE 15.6 



Sequential ANOVA Table for Regression with Groups for Logged Stand Density 
with Factor Type and Variate logQD: Adding the Grouping Factor into the Model 
First (Example 15.1J) 



Term Added 


Incremental 

df 


Incremental 

SS 


Mean 

Square 


Variance 

Ratio 


P 


+ Type 


2 


0.3025 


0.1512 


pT = 2.592 


0.089 


+ logQD 


1 


6.2212 


6.2212 


F'°d = 106.636 


< 0.001 


+ lype.logQD 


2 


0.2011 


0.1006 


prao.T = 1 724 


0.193 


Residual 


35 


2.0419 


0.0583 






Total 


40 


8.7667 









SS = sum of squares. 
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the logQD variate (after fitting the Type factor) is statistically significant (F/°f = 106.64, 
P < 0.001, Table 15.6), indicating that the common slope in response to the logged qua- 
dratic diameter variate was non-zero. In the original ANOVA table, the incremental 
F-test for the Type factor (after fitting the logQD variate) was also highly significant 
(^ 2^35 = 16.63, P < 0.001, Table 15.5), indicating that separate intercepts were required. 
This confirms the parallel lines model cannot be further simplified and is therefore the 
most suitable model for these observations. 



15.1.4 Constraining the Intercepts 

Another model that might be considered relevant is one with a common intercept for 
all t groups but separate slopes for each group, known as the common intercept model. 
This model is based on the assumption that the straight line responses associated 
with the different groups all converge at a single point when x = 0. This model can be 
obtained by addition of fhe inferacfion direcfly fo fhe single line model, and fakes the 
symbolic form 

Explanatory componenf: [1] +x + x.grp 

Here, the term [1] corresponds directly to the common intercept, the variate x provides 
a common slope across all groups and x.grp specifies a differenf slope for each of fhe t 
groups. In mafhematical form, and with first-level-zero parameterization, the resulting 
model is 



yjk — OC + (pi -h + ^jk ■ 

An example of fhis model is shown wifh artificial data in Figure 15.6a: the fitted lines 
diverge from a common infercept af x = 0. 

Except in special circumstances, we advise against the use of fhis model. The argumenfs 
here are analogous to those against regression through the origin in Section 12.9.2. For 
this model to be useful, we firsf require thaf the origin (x = 0) is within (or close to) the 
range of the data, because we cannot objectively evaluate whether the common intercept 
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FIGURE 15.6 

Non-invariance of the common intercept model: (a) fitted model ( — ) for two groups (•,■) and untransformed 
explanatory variate; (b) fitted model ( — ) with centered explanatory variate. 
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model gives a good fit unless some observations are present in this region. Second, the 
origin should have some absolute meaning within the biological context of the data. This 
is required because changing the position of the origin, for example, as when changing 
from the Celsius to Fahrenheit scale of temperature, will also change the required point of 
convergence. Figure 15.6b illustrates the fitted common intercept model when the position 
of the origin is changed by standardization of the explanatory variate: the fitted model is 
very different and now inappropriate for the observed response. 

In general, we follow the principles of marginality (previously discussed in Sections 
8.2.1, 8.3 and 11.2.2) also within the context of regression modelling. We consider a term as 
marginal to any term of which it is a sub-term; for example, terms A and B are both mar- 
ginal to term A.B. The principle of marginality requires that for each term included in a 
model, all terms that are marginal to it, i.e. all sub-terms, should also be included. The only 
exception occurs when some sub-terms are not meaningful, for example, in nested mod- 
els of form A/B, we do not require term B in the model before we fit A.B. We previously 
discussed this principle in the context of models containing only factors, but we apply it 
to all models. For example, the terms x and grp are both marginal to the term x.grp. The 
term [1] is regarded as marginal to all other terms. A model built by progressive addition 
of terms should only allow a term to be added if all of its sub-terms are already present, 
so we can add x.grp only to a model that already contains terms x and grp. Conversely, a 
term should not be dropped from a model if terms that it is marginal to are present in the 
model. For example, we should not drop term grp from a model that also contains term 
x.grp. Use of this principle with explanatory variates ensures that a model is invariant to 
changes of scale. The separate slopes with common intercept model discussed in this sec- 
tion disobeys the rule of marginality, as it does not include the sub-term grp, and so is not 
robust to change of scale. 



15.2 Incorporating Groups into the Multiple Linear Regression Model 

We now generalize the regression with groups model to allow several explanatory vari- 
ates, incorporating groups into a multiple linear regression model. The aim of analysis 
stays the same, namely, to find as simple a model as possible that describes the response 
well. We first describe the general form of the model, and then use an example with two 
explanatory variates to illustrate the procedure of model selection in this relatively simple 
case. 

For a model with q explanatory variates and a single factor with t groups, the most com- 
plex model allows a separate intercept for each group and a separate slope for each group 
for each explanatory variate. In the simplest parameterization, this model can be written 
in mathematical form as 



yjk — oc, -I- -h . . . -I- + . . . -h Xiijk + Cjk , (15-4) 

where, as previously, the units are labelled by index j indicating the group (/ = 1 ... t) 
and index k labelling the observations within each group. The index I is used to identify 
the explanatory variates (/ = 1 ... q) in the model. The model presented in Equation 15.4 
has t intercepts (Uj to af and q xt slope parameters (P^ ... Pu to P^i ... P^,). Each group is 
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associated with one intercept and q slope parameters, one for each explanatory variate. 
This model can be written in symbolic form as 

Explanatory component grp+Xi.grp + ... +x,.grp+ ... +Xg.grp 

As before, grp is a factor (which allows a separate infercepf for each group) and fhe ferm 
Xj.grp is a combinafion of fhe grp facfor and fhe Ith explanatory variafe, x,, which allows 
fhaf variafe to have a separafe slope for each group. 

In pracf ice, we use a somewhaf more complex form of fhe model by progressively adding 
terms fo fhe null model. Wifh firsf-level-zero parameferizafion, fhis resulfs in fhe mafh- 
emafical form 



yjk — (otl + Vy) + (pn + + ... + (P;i + T\lj)xijh + ... + (P,1 + r\tj)x,ijk + Oji , 



wifh Vj = 0 and r|;i = 0 for 1 = 1 ... t. Then, ttj is fhe infercepf for fhe firsf group and Vy is fhe 
difference between infercepfs for fhe ;fh and fhe firsf groups. For fhe Ith explanafory vari- 
afe, P,i is fhe slope for fhe firsf group and ri,y is fhe difference between the slopes for fhe;fh 
and fhe firsf groups. In symbolic form, fhis model can be written as 

Explanafory componenf: /"fj + x^ + . . . + x, -r . . . + x^ + grp 

+ Xi.grp + ... +x,.grp + ... -i-Xq.grp 

In fiffing fhis more complex form of model, we need fo be aware of several pofenfial 
problems. Firsf, fhere may be collinearify wifhin fhe sef of explanafory variafes. This 
can be invesfigafed wifh explorafory scaffer plofs and fhe mefhods described in Secfion 
14.7. If variafes are parfially collinear, fhis can infroduce ambiguify info fhe model selec- 
fion process. If very sfrong collinearify is present fhen fhe model may become unsfa- 
ble and one or more variafes should be omitted. Second, in order for fhe separafe lines 
model fo be sensible, we now require a good overlap of values across groups for each of 
fhe explanafory variafes as well as a reasonable number of observafions in each group. 
Third, we may need fo modify our sfrafegy for model selecfion. Wifh many explanafory 
variafes, many differenf sequenfial ANOVA fables can be consfrucfed by adding fhe 
variafes info fhe model in differenf orders. If fhe full model, wifh separafe slopes for each 
explanafory variafe, makes biological sense and has a reasonable number of residual df 
fhen if is sensible fo sfarf from fhis model and use marginal F-fesfs fo simplify if. When 
a ferm is dropped, fhe model is refitted and a new sef of marginal F-fesfs calculafed. 
An alfernafive sfrafegy is fo sfarf from an infermediafe model and fo consider adding 
or dropping ferms, refiffing fhe model and recalculafing fhe F-sfafisfics each fime fhe 
model is changed. In bofh cases, fhe principle of marginalify should be respecfed: a ferm 
should be added only when all of ifs sub-ferms are already present and a ferm should 
nof be dropped if if is a sub-ferm of anofher ferm sfill in fhe model. As previously, you 
should use diagnosfic plofs fo check fhe fif of fhe model. We illusfrafe some of fhese 
principles in Example 15.2. 

EXAMPLE 15.2: WEED SEED ABUNDANCE 

An observational study was done to investigate whether the number of seeds produced 
by rye-grass could be related to plant characteristics. Between 17 and 24 samples were 
collected from each of four study sites (factor Site, with levels C, L, P and W). At each 
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sample point within each site, the total number of seeds was counted and converted to 
number per m^ (response variate TotalSeed), and the average head length (in mm, vari- 
ate HLength) and average number of spikelets per head (variate Spikelets) on plants was 
recorded. The data are in Table 15.7 and file weedseed.dat. Before analysis, we transform 
the response variable using the logm-transformation, i.e. logSeed = logifTotalSeed), to 
achieve homogeneity of variance of the residuals (see Chapter 6). 

Preliminary exploration of the data indicates a fairly strong relationship between 
the two explanatory variates (sample correlation of r = 0.67, Figure 15.7). This implies 
that there may be some ambiguity as to which variate best explains differences in 
logged seed numbers. The range of both variates varies substantially across the four 
sites. 

Plotting the logi(,(number of seeds) against the number of spikelets or head length 
suggests a positive but noisy relationship in both cases (Figure 15.8), and that logged 



TABLE 15.7 



Observations of Total Seed Number per m^ (Total) with Average Head Length (HL) and Number 
of Spikelets per Plant (Spikes) at Four Sites (Labelled C, L, P and W) (Example 15.2 and File 
weedseed.dat) 



Site 


HL 


Spikes 


Total 


Site 


HL 


Spikes 


Total 


Site 


HL 


Spikes 


Total 


C 


26.74 


26.1 


6232 


L 


26.21 


21.1 


13,320 


P 


33.76 


26.4 


120,937 


C 


23.77 


28.7 


6435 


L 


23.87 


21.2 


15,116 


P 


36.35 


34.1 


127,307 


C 


28.80 


25.2 


7022 


L 


27.58 


21.1 


15,243 


P 


35.98 


28.7 


137,416 


C 


29.89 


30.1 


10,700 


L 


23.57 


19.6 


16,830 


P 


31.22 


27.5 


161,070 


C 


31.01 


29.1 


11,524 


L 


28.33 


24.7 


18,856 


P 


33.96 


25.4 


162,154 


C 


26.00 


26.3 


12,814 


L 


24.39 


26.1 


20,930 


P 


32.76 


27.7 


173,734 


C 


31.25 


21.4 


13,093 


L 


20.95 


21.7 


24,200 


P 


29.47 


27.7 


181,971 


C 


33.48 


27.7 


14,991 


L 


30.03 


24.6 


24,369 


P 


32.35 


25.9 


190,408 


C 


24.86 


28.0 


15,137 


L 


26.92 


26.0 


24,944 


P 


35.48 


30.8 


215,477 


C 


25.71 


28.0 


16,162 


L 


27.47 


23.1 


25,097 


P 


34.53 


30.5 


245,200 


C 


26.79 


27.9 


16,956 


L 


26.46 


27.0 


28,136 


P 


30.59 


27.2 


246,758 


C 


31.19 


27.8 


16,962 


L 


28.19 


23.1 


31,434 


P 


31.75 


29.2 


300,595 


C 


30.79 


26.8 


17,234 


L 


27.81 


21.8 


33,256 


W 


20.09 


19.8 


59,321 


C 


33.17 


28.7 


17,409 


L 


27.26 


21.6 


34,690 


W 


21.92 


22.6 


59,960 


C 


30.74 


26.1 


17,414 


L 


31.30 


24.2 


38,623 


w 


18.21 


18.0 


62,700 


C 


31.62 


25.5 


18,828 


L 


27.20 


21.5 


54,260 


w 


22.74 


20.4 


66,096 


C 


30.22 


28.1 


22,611 


L 


30.38 


25.9 


58,827 


w 


26.92 


24.9 


78,618 


C 


31.59 


28.6 


23,690 


P 


28.28 


24.2 


35,042 


w 


23.00 


22.1 


80,400 


C 


29.72 


26.1 


25,108 


P 


31.50 


25.1 


42,312 


w 


24.91 


25.9 


84,607 


C 


31.23 


27.6 


26,121 


P 


38.44 


27.5 


60,867 


w 


19.99 


19.2 


90,436 


C 


37.12 


30.0 


28,161 


P 


33.99 


25.1 


62,047 


w 


20.80 


21.0 


93,800 


c 


33.12 


27.5 


28,637 


P 


29.31 


23.4 


65,286 


w 


23.85 


24.3 


105,700 


c 


25.22 


27.2 


31,439 


P 


29.16 


24.6 


80,327 


w 


20.35 


23.1 


106,321 


L 


19.66 


23.5 


7084 


P 


33.27 


23.6 


96,021 


w 


22.05 


22.0 


106,480 


L 


27.94 


22.3 


10,436 


P 


27.02 


24.7 


96,173 


w 


23.67 


21.5 


122,820 


L 


32.14 


22.8 


11,119 


P 


33.89 


24.6 


100,352 


w 


28.43 


24.1 


128,132 


L 


27.40 


22.0 


11,613 


P 


30.94 


24.0 


104,777 


w 


18.84 


19.4 


130,834 


L 


26.93 


22.8 


11,883 


P 


25.73 


18.0 


112,500 


w 


21.24 


21.6 


157,896 


L 


26.03 


23.8 


12,824 


P 


32.90 


25.3 


117,237 


w 


19.86 


20.8 


171,947 



Source: Data from R. Alarcon-Reverte, Rothamsted Research. 
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FIGURE 15.7 

Average head length (mm) vs average number of spikelets per head at four sites: A C (site 1), • L (site 2), ■ P (site 
3), T W (site 4) (Example 15.2). 




seed numbers may be more related to differences between sites than to either of the 
variates. For this reason, we fit factor Site as the first term in the model, followed by 
the two explanatory variates, and then the combined terms of factor Site with each of 
the two variates. 

As there are between 17 and 24 observations at each site, it is a reasonable strategy to 
fit the full separate lines model and then seek to simplify it. The full model is expressed 
in symbolic form as 

Response variable: logSeed 

Explanatory component: [1] + Site + HLength + Spikelets + HLength.S\te 
+ Spikelets.She 

This model can be written in mathematical form, with first-level-zero parameterization, 
as 




Head length Number of spikelets 



FIGURE 15.8 

Total number of seeds per m^ (logjo scale) vs (a) average head length (mm) and (b) average number of spikelets 
per head at four sites: A C, • L, ■ P, t W (Example 15.2). 
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logSeedj^ = («! + v,) + (Pn + rii,) HLength^^ + (P^^ + SpikeletSj^ + , 

with Vj = 0, r|jj = 0 and riji = 0. In this model, the response logSeedjj^ represents the fcth 
observation at the jth site, with / = 1 . . . 4, A: = 1 . . . and Wj = 23, = 23, Wj = 24 and 

«4 = 17, and corresponding values of the explanatory variates indicated by HLength^^ 
and SpikeletSj^. The overall intercept, Uj (associated with term [1]), represents the inter- 
cept at the first site. The site-specific intercepts, (associated with term Site), represent 
the difference in intercepts between the ;th and the first sites. The overall slope for 
explanatory variate head length, Pjj (associated with the term HLength), represents the 
slope with respect to head length at the first site. The site-specific slopes of head length, 
T|jy (associated with term HLength.S\te), represent the difference between slopes at the 
yth and the first sites. Overall and site-specific slope parameters for the number of spike- 
lefs, p 2 i and r| 2 j (associated with terms Spikelets and Sp/7ce/efs.Site), are interpreted 
similarly. 

Table 15.8 shows two sequential ANOVA tables in abbreviated form, the first con- 
structed by adding head length before number of spikelets, and the second using the 
other order. Recall that the estimated parameters are the same from the two fits, but 
the sequential ANOVA tables differ because of the partial collinearity between the two 
explanatory variates. The full model has a total of 12 parameters (one intercept and 
two slopes for each of the four sites) and can be interpreted as fitting separate planes in 
terms of the two explanatory variates for each site. 

The next step in our analysis is the identification of a suitable model. We first con- 
sider the combined terms, Sp/ke/efs.Site and HLength.Sde. We can obtain marginal 
F-tests for these terms from the sequential ANOVA tables in Table 15.8 and find that 
the F-statistic for dropping Spikelets.S\ie is less significant = 0.65, P = 0.584, 

Table 15.8a) and so drop this term. As there are 75 residual df in the full model, we do 
not refit the model, but can immediately consider whether we can then drop the term 
/-/Length. Site. Its F-statistic is not significant (Fs^ys® = 1.71, P = 0.172, Table 15.8a) and so 
we also drop this term. There is therefore no evidence for separate slopes across sites 
for either of these variates. 

We then consider whether we can drop either of the explanatory variates. We find 
that if we add the HLength variate after the Site factor and the Spikelets variate, there 
is no significant improvement to the model (Pifs = 2.32, P = 0.132, Table 15.8b). If we 
add the Spikelets variate after Site and HLength, then there is a small and borderline- 
significant improvement to the model (F^^j = 3.82, P = 0.054, Table 15.8a). This indicates 
that HLength adds no information once Spikelets is in the model, but Spikelets may add 

TABLE 15.8 



Abbreviated Sequential ANOVA Tables for Separate Lines Model for Logged Seed Counts 
with Factor Site and Explanatory Variates Spikelets and HLength (Example 15.2) 



(a) 

Term Added 


Inc. 

df 


Mean 

Square 


P 


(b) 

Term Added 


Inc. 

df 


Mean 

Square 


P 


+ Site 


3 


4.406 


< 0.001 


+ Site 


3 


4.406 


< 0.001 


+ HLength 


1 


0.291 


0.009 


+ Spikelets 


1 


0.351 


0.004 


+ Spikelets 


1 


0.155 


0.054 


+ HLength 


1 


0.095 


0.132 


+ HLength. S\te 


3 


0.070 


0.172 


+ Spikelets.S'Ae 


3 


0.024 


0.623 


+ Spikelets. S\te 


3 


0.027 


0.584 


+ HLength.S'ite 


3 


0.072 


0.159 


Residual 


75 


0.041 




Residual 


75 


0.041 




Total 


86 


0.198 




Total 


86 


0.198 





Inc. = incremental. 
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TABLE 15.9 



Abbreviated Sequential ANOVA Tables for Separate Lines Model for Logged Seed Counts 
with Factor Site and Explanatory Variate Spikelets (Example 15.2) 



(a) 

Term 

Added 


Inc. 

df 


Mean 

Square 


P 


(b) 

Term Added 


Inc. 

df 


Mean 

Square 


P 


+ Site 


3 


4.406 


< 0.001 


+ Spikelets 


1 


0.002 


0.815 


+ Spikelets 


1 


0.351 


0.005 


+ Site 


3 


4.522 


< 0.001 


Residual 


82 


0.042 




Residual 


82 


0.042 




Total 


86 


0.198 




Total 


86 


0.198 





Inc. = incremental. 



a little information when HLength is in the model. We therefore drop the HLength vari- 
ate from the model. This leaves the Site factor and Spikelets variate in the model; i.e. a 
parallel lines model in terms of Spikelets. We refit this model in both orders, giving the 
abbreviated sequential ANOVA tables in Table 15.9. 

The incremental F-test for the Spikelets variate is not significant when it is fitted first 
(Table 15.9b), but is highly significant when fitted after factor Site (Table 15.9a). Factor 
Site is highly significant for both orders of fitting. The fitted model in Figure 15.9 shows 
the reason for this: differences in logged seed count between sites are so large that the 
relationship with number of spikelets can be detected only after we have corrected for 
this effect. This parallel lines model, with regression on Spikelets, is therefore our pre- 
dictive model. Residual plots (not shown) indicate no conflict with the model assump- 
tions, and plotting the residuals against the omitted HLength variate shows no evidence 
of any relationship. The predictive model accounts for 78.8% of the variation in the data 
(adjusted = 0.788), and can be written in simple form as 



[ij{Spikelets) = a j + ^Spikelets , 



where y] (Spikelets) represents the prediction on the log scale at the/th site for the speci- 
fied number of spikelets. The parameter estimates are listed in Table 15.10. 




FIGURE 15.9 

Logged number of seeds with predictive parallel lines model ( — ) in terms of average number of spikelets per 
head at each of four sites: A C, • L, ■ P, t W (Example 15.2). 
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TABLE 15.10 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed 
Significance Levels (P) for a Parallel Lines Model for Logged Seed Counts in 
Terms of Variate Spikelets and Factor Site with Four Levels (Example 15.2) 



Term 


Parameter 


Estimate 


SE 


t 


P 


Site 1 




3.450 


0.2646 


13.040 


< 0.001 


Site 2 




3.671 


0.2249 


16.324 


< 0.001 


Sites 


CCg 


4.343 


0.2547 


17.048 


< 0.001 


Site 4 




4.376 


0.2142 


20.432 


< 0.001 


Spikelets 


P 


0.0277 


0.00955 


2.895 


0.005 



Differences on the logarithm scale can be interpreted in terms of ratios on the original 
scale (Section 6.4), and so the predictive model can be interpreted as a multiplicative 
model, with 



^fSpikelets) = = 10“' x . 

The difference between sites dominates the model. For a fixed number of spikelets, here 
denoted s, the logm-ratio of seed counts between any two sites (labelled i and j) can be 
written as 



logio[|i,(s)/ji.(s)] = a,+ Ps - (d;+ ps) = di- d/ . 

We can therefore obtain a Cl for the log-ratio in terms of a Cl for the difference a, - 
and then back-transform to get a Cl for the ratio p,(s)/py(s). For example, consider sites 
4 (W) and 1 (C). The estimated logiQ-ratio is 

logio[|l4(s)/|ij(s)] = at - di = 4.376 - 3.450 = 0.926 , 



with SE 0.084. A 95% Cl for the logiQ-ratio, logio[|J.4(s)/Pi(s)], can be calculated via the 
difference 04-04 as (0.759, 1.093). We back-transform to estimate the ratio P4/IJ.1 as 
10°-926 = 8.43^ so we expect 8.43 times as many seeds at a location in the fourth site as 
in the first site (for the same number of spikelets), with a 95% Cl for this ratio equal to 

( 100 . 759 ^ 101 . 093 ) = ( 574 ^ 12 . 39 ). 

Within a site, P = 0.0277 (SE = 0.0096) represents the expected increase in logged seed 
count for an increase of one spikelet per plant. We can predict the relative change in 
seed count for an increase of one spikelet in terms of P using 

h,(s + 1) ^ 10“' X 1QP<-^ ^ 

A;(s) 10“' X 10^* 



We can therefore predict that seed count will increase by a factor of lO®"^^^ = 1.066 
(a 7% increase) for an increase of one spikelet per plant. We can calculate a 95% Cl for 
P as (0.0086, 0.0468) and can back-transform this to give a Cl for the relative change as 
(lQo.oos6^ 100 0468) = (1.02, 1.11), Corresponding to a 2-11% increase in seed count for plants 
with one additional spikelet. 
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15.3 Regression in Designed Experiments 

Recall from Section 1.3 that we usually allow for both structural and explanatory com- 
ponents within our models. We have not yet allowed for a structural component within 
regression models because software for regression modelling does not generally support 
this (as discussed in Sections 11.6 and 12.1). It is possible to incorporate some types of 
structure within the explanatory component of the model, however; this is the intra- 
block analysis of Section 11.6.1 and, in the context of regression modelling, can be imple- 
mented as regression with groups. This approach can be used in studies where we wish 
to account for structure prior to estimation of the regression line. A common example 
is regression within a RCBD, where a fixed set of values of the explanatory variate are 
applied to experimental units within each block. The RCBD model allows for an addi- 
tive difference between responses to the same treatment from different blocks which, in 
combination with a linear response to the explanatory variate, corresponds exactly to a 
parallel lines model. When replication is present, we can formally test for lack of fit to 
the common regression line (see Section 12.8). Recall that we regard structural terms as 
intrinsic to the model, and so we do not formally test or omit these terms. We also treat 
structural terms differently to explanatory terms in the predictive model as, although 
we wish to account for structure, we do not usually wish to include it in predictions. 
When a parallel lines model is appropriate, we would usually present the response aver- 
aged over structural variables as our predictive model, and we interpret this as the pre- 
dicted response for the average conditions in the study. We illustrate these concepts in 
Example 15.3. 



EXAMPLE 15.3: FORAGE MAIZE YIELDS 

An experiment at Rothamsted Research in 1996 investigated the yield response of for- 
age maize to nitrogen fertilizer. The experiment was designed as a RCBD with three 
blocks (factor Block) of four plots (factor Plot), with nitrogen fertilizer rates of 0, 70, 
140 and 210 kg N (variate N). The whole crop forage yields from each plot (at 100% dry 
matter in tonnes/hectare, variate Yield) are shown in Table 15.11 and are held in file 
FORAGE.DAT. The aim of analysis is to model the yield as a function of applied nitrogen. 
We start by analysing the experiment as a RCBD, with model 



TABLE 15.11 

Whole Crop Forage Yield (Yield, t/ha) from a RCBD with Three Blocks and Four 
Nitrogen Fertilizer Rates (N = 0, 70, 140, 210 kg N) (Example 15.3) 





Block 1 






Block 2 




Block 3 




Plot 


N 


Yield 


Plot 


N 


Yield 


Plot 


N 


Yield 


1 


0 


10.42 


1 


70 


11.62 


1 


70 


11.13 


2 


140 


12.21 


2 


0 


11.98 


2 


210 


12.57 


3 


210 


12.85 


3 


210 


12.81 


3 


0 


9.82 


4 


70 


12.22 


4 


140 


12.67 


4 


140 


10.92 



Source: Data from P. Poulton, Rothamsted Research. 
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TABLE 15.12 

Multi-Stratum ANOVA Table for Forage Yields from a RCBD 
with Three Blocks (Factor Block) of Four Plots (Factor Plot) and 
Four Nitrogen Treatments (Factor FacN) (Example 15.3) 



Source of Variation 


df 


Sum of 
Squares 


Mean 

Square 


Variance 

Ratio 


P 


Block stratum 












Residual 


2 


2.8385 


1.4192 


4.399 


0.067 


Block.Plot stratum 












FacN 


3 


6.1434 


2.0478 


6.347 


0.027 


Residual 


6 


1.9359 


0.3227 






Total 


11 


10.9178 


0.9925 







Response variable: Yield 

Explanatory component: FacN 

Structural component: Block/Plot 

where FacN is a factor with a separate level for each nitrogen application rate. The 
resulting multi-stratum ANOVA is shown in Table 15.12. 

The residual plots are adequate and a plot of the predicted means shows a linear trend 
in applied nitrogen (Figure 15.10a). We could fit a linear polynomial contrast within the 
multi-stratum ANOVA, but an intra-block analysis is appropriate for a RCBD and we 
will take that option here. As a first step, we fit the parallel lines model and check for 
lack of fit (as in Section 12.8) with 

Explanatory component: Block + N + FacN 

This fits the structural factor Block first, then adds explanatory variate N to obtain the 
parallel lines model with a separate intercept for each block. To test whether the straight 
line captures the pattern of response, we then add the factor version of the explanatory 
variate (FacN) into the model, giving the sequential ANOVA in Table 15.13. The SS for 
Block is the same as in the multi-stratum ANOVA, the SS for the nitrogen treatments 
has been partitioned into variation due to the straight line (A/) and deviations from it 




FIGURE 15.10 

Predictive model ( — ) with 95% Cl ( — ) for forage maize yields (at 100% dry matter in tonnes/hectare) in terms of 
nitrogen applied (kg) with (a) fitted treatment means (•) and LSD from multi-stratum ANOVA and (b) observed 
data (• = block 1, ■ = block 2, A = block 3) (Example 15.3). 
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TABLE 15.13 

Sequential ANOVA for Forage Yields from a RCBD, Testing for Lack of Fit 
to Parallel Lines Model (Example 15.3) 



Term Added 


Incremental 

df 


Incremental 

SS 


Mean 

Square 


Variance 

Ratio 


P 


+ Block 


2 


2.8385 


1.4192 


4.399 


0.067 


+ N 


1 


5.9283 


5.9283 


F'«= 18.374 


0.005 


+ FacN 


2 


0.2150 


0.1075 


pFacN = 0.333 


0.729 


Residual 


6 


1.9359 


0.3227 






Total 


11 


10.9178 


0.9925 







SS = sum of squares. 



(FacN after eliminating N), and the residual SS remains unchanged. The straight line 
accounts for most of the variation due to treatments and is highly significant (F''' = 18.37, 
P = 0.005). There is no statistical evidence for lack of fit (F'^®‘='^ = 0.33, P = 0.729) and no 
evidence of model misspecification in fitted model or residual plots when the lack-of-fit 
term (FacN) is omitted. We therefore accept the parallel lines model as our predictive 
model which, with first-level-zero parameterization, takes the form 



(ii(N) = 6ci + Bloch + PN , 



where y.i{N) is the predicted yield (t/ha) in the ith block (1 = 1 ... 3) for nitrogen appli- 
cation rate N {0 <N < 210), 6ii is the estimated intercept for the first block, Blocki is 
the difference in intercept between the 1th and the first blocks, and P is the estimated 
slope of the relationship with nitrogen application rate. These estimates are listed in 
Table 15.14. 

In this parallel lines model, we can summarize the overall performance by averag- 
ing across blocks to obtain the predictive model for average conditions within the 
trial as 



1 - 

h(N) = = 



PW where a* = 6ci h — ^ Blocki . 

Z = 1 



TABLE 15.14 

Parameter Estimates with Standard Errors (SE), t-Statistics (t) and 
Observed Significance Levels (P) for Parallel Lines Model for 
Porage Yield from a RCBD with Three Blocks and Explanatory 
Variate N (Example 15.3) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a, 


10.982 


0.3279 


33.487 


< 0.001 


Block 1 


Bloch 


0 


— 


— 


— 


Block 2 


Blacky 


0.345 


0.3667 


0.941 


0.374 


Block 3 


Blacky 


-0.815 


0.3667 


-2.223 


0.057 


N 


P 


0.0090 


0.00191 


4.696 


0.002 
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The SE for the averaged intercept is calculated from the estimated variance-covariance 
matrix for the estimated intercepts (see Section C.4). Here, the averaged intercept is 
d* = 10.825 (SE = 0.2505). The final predictive model therefore takes the form 

|i(N) = 10.825 + 0.0090N . 

This predictive model is shown with 95% confidence intervals and the observed data in 
Figure 15.10. It is clear that this predictive model gives a good fit to the original treat- 
ment means and passes through the centre of the observed data. 

Where it is appropriate, the intra-block model is a valid alternative to multi-stratum 
ANOVA (see Sections 7.5 and 11.6). In principle, we prefer the multi-stratum ANOVA 
approach because it explicitly accounts for the different status of structural and explana- 
tory terms in the model. In practice, it can be difficult to extract regression models (with 
SE) from multi-stratum ANOVA output, and so in these cases, we would accept the intra- 
block analysis. Where the intra-block model is not appropriate, the linear mixed models 
discussed in Chapter 16 give a more general and flexible approach to regression modelling 
when structure is present. 



15.4 Analysis of Covariance: A Special Case of Regression with Groups 

The analysis of covariance (ANCOVA, sometimes also known as ANOCOVA) is used 
to incorporate a limited form of regression with groups into the analysis of a designed 
experiment to adjust treatment comparisons for the presence of quantitative extraneous 
variables. This is separate from, and often in addition to, the inclusion of structure dis- 
cussed in Section 15.3. In this context, a covariate is an explanatory variate thought to 
influence the response that was not taken into account in the design of the experiment. 
Adjusting for this variate ensures that comparisons between treatments are not biased by 
its presence and may reduce uncertainty in estimates of group differences. For example, 
consider a field experiment designed to compare yield response to drought. If it is thought 
that soil depth varies across the field, and that this may affect response to drought, then 
the soil depth can be measured within each experimental plot and used as a covariate to 
account for variation in drought response that is unrelated to the experimental treatments. 
However, the ANCOVA approach is sensible only if a parallel lines model is a good fit to 
the data and if the covariate values are unrelated to the groups. For example, consider the 
scenarios shown in Figure 15.11. 

In Figure 15.11a, the effect of the covariate is the same across all three groups, and so the 
parallel lines model describes the response well. In this case, the differences among the 
three groups can be sensibly summarized by comparisons at any given value of the covari- 
ate; in practice comparisons are usually made at the covariate sample mean. In contrast, 
in Figure 15.11b, the effect of the covariate differs between the groups, so that differences 
among groups are highly dependent on the value of the covariate. In this case, we cannot 
summarize treatment differences without also taking into account the covariate value; 
here traditional ANCOVA, which is based implicitly on the assumption of parallel lines, is 
inappropriate and a separate lines model should be fitted and reported. In Figure 15.11c, 
although a parallel lines model appears appropriate, the range of the covariate differs 
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(a) (b) (c) 




FIGURE 15.11 

Adjusting treatment differences using a covariate with three groups showing predictive model ( — ): (a) 

common treatment difference for all covariate values; (b) treatment difference depends on covariate value; (c) 
covariate value dependent on group. 



between the groups, and this can lead to problems of both estimation and interpretation, 
as differences in response can be affribufed fo eifher fhe covariafe or fhe freafment groups. 
If fhe differences in fhe covariafe values are infrinsic fo fhe groups, i.e. if fhey would be 
replicafed in ofher sfudies, fhen adjusfmenf fo a common covariafe value mighf give a 
misleading impression of group differences. On fhe ofher hand, if we do want to adjust for 
fhe covariafe buf fhere is no overlap befween fhe groups, fhen if is difficulf fo esfablish an 
appropriafe value of fhe covariafe at which to make comparisons, as this requires extrapo- 
lation for one or more groups. Fiffing a separafe groups model (Secfion 15.1.3) wifh fhe 
covariafe as fhe response can be used fo defecf fhis sifuafion. 

Wifhin fhe liferafure on analysis of designed experimenfs, a fradifion of ANCOVA has 
developed fhat largely ignores fhe wider confexf of regression wifh groups. One reason for 
fhis is fhe use of mulfi-sfrafum ANOVA fo accounf for sfrucfure, which can easily incor- 
porafe explanatory variates using a parallel lines model, buf cannof easily accommodafe a 
separafe lines model. Conversely, some books consider ANCOVA fo be nofhing more fhan 
regression wifh groups, buf fhis approach offen misses fhe nuances associated wifh struc- 
fure in designed experimenfs. We recommend a hybrid approach, which we now oufline 
before presenting an example. Remember that the aim of ANCOVA is fo presenf group 
comparisons adjusfed for covariafe effecfs. 

The first step is to test whether the covariate is related to the groups, incorporating any 
structure into the analysis. This can be done with multi-stratum ANOVA for balanced 
designs, and wifh infra-block analysis (Secfion 11.6.1) or linear mixed models (Chapfer 16) 
for unbalanced designs. If fhe covariafe does differ sysfemafically befween groups, fhen 
you should ask why, whaf fhe implicafions for inferprefafion are, and whefher ANCOVA 
is a sensible approach. If fhere is no such relafionship, or if you decide fo proceed any- 
way, fhen fhe nexf sfep is fo consider whefher a parallel lines model is plausible. If fhe 
number of observafions wifhin each group is reasonably large (> 5), if mighf help fo fif a 
separafe lines model and fo fest explicifly for group differences in response fo fhe covari- 
afe, again incorporafing any sfrucfure. If fhe number of observafions per group is small 
(<5), fhen fhe fiffed lines for individual groups are likely fo be unreliable, so fhis sfep 
may be omiffed. In eifher case, if usually helps fo plof fhe response againsf fhe covariafe 
with groups indicated (as in Figure 15.11) to give a visual assessment of fhe relafion- 
ship, and of fhe exfenf of overlap in covariafe values among groups. If fhe parallel lines 
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model still seems plausible, it can then be fitted to obtain estimates of group differences 
(sfill incorporafing any sfrucfure in fhe model). The covariafe is always filled firs! in fhe 
parallel lines model so fhaf differences befween groups are assessed affer fhe covariafe 
has been faken info accounf. The filled model and residual plofs should be assessed fo 
check for any evidence of model misspecificafion before fhe resulfs are accepfed and 
inferprefed. 



EXAMPLE 15.4A: THOUSAND GRAIN WEIGHTS* 

A field experiment was done to investigate the impact of growth regulator (+/-) on 
seed production for two varieties of oilseed rape (B or N) in a CRD with six replicates. 
Unfortunately, pigeons grazed some parts of the trial in early spring, and it was thought 
that this damage might affect plant growth and seed development. The extent of dam- 
age to each plot was recorded as percentage area grazed (variate Damage), to the nearest 
10%, and the aim of analysis was to make treatment comparisons after accounting for 
bird damage, if possible. The response is thousand grain weight (abbreviated as TGW, 
held in variate TGW). The treatment combinations form the four factorial combinations 
of the two growth regulators and two varieties and are represented here by a single 
factor (Trt with groups labelled as +B, +N, -B, -N). The data are in Table 15.15 and file 

TGW.DAT. 

Figure 15.12 plots variate TGW against variate Damage, with a unique symbol for 
each group. The range of pigeon damage clearly differs between treatments, with the 
second (+N) and third (-B) groups having large and intermediate amounts of damage, 
respectively. 

The first model fitted was used to investigate the relationship between the covariate 
and the treatment groups formally, as 

Response variate: Damage 

Explanatory component: [1] + Trt 



TABLE 15.15 

Thousand Grain Weight (Variate TGW) and Pigeon Damage (% Plot Grazed, Variate Damage) 
for a Field Experiment Comparing Two Varieties N and B (Factor Variety) with (+) or without (-) 
Growth Regulator (Factor GR) (Example 15.4 and File tgw.dat). Factor Trt Gives a Single Code 
for Each of the Four Treatment Combinations 



Plot 


GR 


Variety 


Trt 


Damage 


TGW 


Plot 


GR 


Variety 


Trt 


Damage 


TGW 


1 


+ 


N 


+N 


60 


3.342 


13 


+ 


N 


+N 


60 


3.150 


2 


- 


B 


-B 


30 


3.185 


14 


- 


N 


-N 


60 


3.436 


3 


- 


N 


-N 


40 


3.997 


15 


+ 


N 


+N 


50 


3.793 


4 


+ 


N 


+N 


30 


4.111 


16 


+ 


N 


+N 


40 


3.937 


5 


+ 


B 


+B 


20 


3.783 


17 


- 


N 


-N 


40 


3.901 


6 


- 


B 


-B 


20 


3.302 


18 


- 


B 


-B 


30 


3.357 


7 


- 


N 


-N 


0 


4.807 


19 


- 


B 


-B 


30 


3.562 


8 


- 


N 


-N 


0 


4.451 


20 


- 


B 


-B 


30 


3.338 


9 


- 


B 


-B 


30 


3.419 


21 


+ 


B 


+B 


0 


3.749 


10 


+ 


B 


+B 


40 


3.295 


22 


+ 


B 


+B 


30 


3.138 


11 


+ 


B 


+B 


50 


3.169 


23 


+ 


B 


+B 


0 


3.756 


12 


+ 


N 


+N 


50 


3.591 


24 


- 


N 


-N 


0 


5.019 
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Pigeon damage 



FIGURE 15.12 

Thousand grain weight (TGW) with separate lines model ( — ) for pigeon damage (% plot grazed) with treatment 
groups (• +B, ■ +N, A -B, t -N) (Example 15.4A). 



This model suggests there may be differences between treatment groups (F 3 20 = 2.642, 
P = 0.077), but it is not conclusive. A biological assessment of fhe matter concluded that 
the intention was to compare seed quality in the absence of pigeon grazing, so correc- 
tion for such damage is appropriafe in this context even if grazing did differ between 
treatments. The second model fitted was a separate lines model with the covariate and 
the treatment groups, specified as 

Response variate: TGW 

Explanatory component: [1] + Damage + Trt + Damage.Trt 

The variance ratio F 3 °i™ = 2.710 (F = 0.080) suggested some differences in slope between 
treatments, and the fitted lines are shown in Figure 15.12. The slope for the third group 
(-B) is positive whereas those for the other treatments are negative, but this group has 
such a small range of grazing damage (five plots have 30% grazing and one plot 20% 
grazing) that this line cannot be considered reliable. A parallel lines model was there- 
fore deemed plausible, and the effect of fhe treatments was tested. The incremental 
F-statistic for the factor Trt (p 3™6 = 28.874, P < 0.001) indicated large treatment differ- 
ences after accounting for grazing damage. Estimates from the parallel lines model are 
shown in Table 15.16, and this predictive model is written in mathematical form, with 
first-level-zero parameterization, as 



y /{Damage) = (ai + v^) + ^Damage , 



where y /{Damage) represents the predicted TGW for the jth treatment group (; = 1 . . . 4) 
with grazing damage Damage. Parameter Si is the estimated intercept for the first treat- 
ment (+B), V/ is the estimated difference in intercept for the ;th treatment (relative to the 
first) and p is the estimated slope associated with grazing damage. 

Predictions from this model evaluated at the average damage of 30.83% grazing are 
shown in Table 15.17a. The LSDs for these predictions (at significance level = 0.05 with 
19 df) range between 0.2301 and 0.2615. It is clear that groups 2 (+N) and 4 (-N) give 
greater seed weights than the other groups (+B, -B), with no significant differences within 
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TABLE 15.16 

Parameter Estimates with Standard Errors (SE), t-Statistics (t) 
and Observed Significance Levels (P) for Parallel Lines Model 
for Thousand Grain Weight with Treatment Groups (Pactor Trt) 
Adjusted for Pigeon Grazing (Variate Damage) (Example 15.4A) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


«i 


3.926 


0.0954 


41.132 


< 0.001 


Damage 


P 


-0.0190 


0.00237 


-8.018 


< 0.001 


Trt+B 


Vi 


0 


— 


— 


— 


Trt+N 


Vz 


0.648 


0.1249 


5.188 


< 0.001 


Trt-B 


V3 


-0.026 


0.1106 


-0.235 


0.817 


Trt-N 


V4 


0.787 


0.1099 


7.158 


< 0.001 



TABLE 15.17 

Predicted Thousand Grain Weight with Standard Error (SE) 
from Two Explanatory Models for Eour Treatment Groups 
with 30.83% Grazing Damage per Plot (Example 15.4A) 



Treatment 


(a) With Covariate 


(b) Without Covariate 


Prediction 


SE 


Prediction 


SE 


+B 


3.339 


0.0797 


3.482 


0.1586 


+N 


3.987 


0.0881 


3.654 


0.1586 


-B 


3.313 


0.0780 


3.361 


0.1586 


-N 


4.126 


0.0797 


4.268 


0.1586 



each of these two sets. Given our knowledge of factorial structures (Chapter 8), we should 
be able to clarify our inferences in terms of the underlying factors, variety and use of 
growth regulator, and we examine this further in Example 15.4B. Eor now, we compare 
the predictions from the ANCOVA (parallel lines model) with those from the separate 
groups model, which ignores the covariate (Table 15.17b). Predictions from the parallel 
lines model have been shifted according to the amount of grazing observed: the predicted 
TGW for the +N group, which was more heavily grazed, is adjusted upwards and the 
predicted TGW for the +B and -N groups, which were less heavily grazed, are adjusted 
downwards. The prediction SEs and SEDs are substantially smaller in the parallel lines 
model because the covariate has accounted for some of the variation among replicate plots. 

Within the context of designed experiments, the traditional ANCOVA procedure differs 
from fhaf given above in several ways. Firsf, if is nof usual fo fif fhe separafe lines model fo 
invesfigafe formally whefher fhere is evidence againsf fhe parallel lines model. However, 
we find fhis sfep useful, and recommend if where fhere are sufficienf observafions wifhin 
groups fo make fhe analysis meaningful. Second, if is usual fo presenf a single composife 
ANOVA fable wifh F-fesfs for freafmenf groups affer eliminafion of fhe covariafe and for 
fhe covariafe affer eliminafion of fhe freafmenf groups. This ANOVA fable is amalgam- 
afed from fwo differenf sequences of models: one wifh fhe covariafe fiffed firsf and fhe 
ofher wifh fhe facfor(s) fiffed firsf. In confrasf, we have jusf used fhe sequenfial ANOVA 
fable obfained by fitting fhe covariafe firsf. Bofh approaches are correcf, buf we find our 
approach more sfraighfforward. Anofher example of ANCOVA, fhis fime for a sfrucfured 
experimenf analysed by linear mixed models, is given in Example 16.2. 
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15.5 Complex Models with Factors and Variates 

In general, we might have several explanatory factors and variates and the corresponding 
models become considerably more complex. The aims of analysis sfay fhe same, namely, fo 
find a predicfive model fhaf uses as few paramefers as possible fo describe fhe response. 
The aufomafic model selecfion fechniques described in Secfion 14.9 can be used, buf fhey 
musf be modified somewhaf, and fhese modificafions are described in Secfion 15.5.1. Once 
we have idenfified a predicfive model, fhen we need fo decide which predicfions fo make, 
and fhis is discussed in Secfion 15.5.2. 



15.5.1 Selecting the Predictive Model 

In Chapters 8, 11, 14 and 15, we have used various different strategies for model selection, 
so here we try to set them out in a coherent framework. In all cases, we obey the principle 
of marginality when adding or dropping terms, and if our explanatory model contains 
terms associated with the structural component then we fit the structural terms first and 
do not test them (see Section 15.3). Our strategy will depend on whether the explanatory 
variables are orthogonal (see Section 11.1), and whether the full model is well defined. 
The full model consists of meaningful terms constructed from the set of explanatory 
variables and appropriate combinations. The full model is considered well defined if 
it was specified during the design phase of the study and has a reasonable number of 
residual df. 

When the explanatory variables are orthogonal, we can use a single sequential ANOVA 
table for model selection. This usually only occurs in the context of a designed study, 
where the full model is well defined. We then start with the full model and progressively 
test and drop non-significant terms (respecting marginality) to identify the predictive 
model (e.g. Section 8.3). 

When the explanatory variables are not orthogonal, there may be many sequential 
ANOVA tables. We will consider separately the cases when the full model is well defined 
and those where it is not. 

If the full model is well defined, then we again fit the full model and progressively 
test and drop non-significant terms, respecting marginality. If there are few sequential 
ANOVA tables, then we might form all of them. If the residual df are large, then there is 
no need to refit the model if we can deduce all the required information from these ini- 
tial tables. If there are many different sequential ANOVA tables, then forming them all is 
impractical and it will usually be easier to use marginal F-tests to progressively simplify 
the full model. At each step of the process, we then identify the terms that can be dropped 
(respecting marginality), and form a marginal F-test for each of these terms. We drop the 
least significant term, i.e. the one with the largest observed significance level (P) subject 
to P > 0.05, and then refit the model. We repeat this process until no further terms can be 
eliminated. This is the backward elimination procedure of Section 14.9. 

If the full model is not well defined, then we must start from a simpler model and 
consider both adding and dropping terms as for the stepwise regression procedures 
described in Section 14.9, but in these more complex models we must now also respect 
marginality. 

These latter two approaches suggest that the automatic selection procedures described 
in Section 14.9 are more widely useful, although the caveats stated there still apply and 
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some modification is required to account for model terms involving explanatory factors. 
In particular, we now require automatic selection procedures to respect marginality when 
terms are added or dropped. Some caution is also required for model terms with more 
than 1 df. The numerator df for an F-test is always the change in model df obtained on 
adding or dropping the term. Since the critical value of an F-distribution decreases for 
larger numerator df, it is difficult to define a single threshold in terms of a critical value, 
especially where the df for the model terms cover a large range. In this case, it is usually 
more sensible to define thresholds in terms of the observed significance level, i.e. SLE or 
SLS (defined in Section 14.9). 



EXAMPLE 15.4B: THOUSAND GRAIN WEIGHTS* 

We now reanalyse the TGW data to recognize and exploit the crossed structure within 
the treatment groups. The four treatment groups (+B, +N, -B, -N) form a factorial set 
related to the factors Variety (with two levels, B and N) and GR, indicating the presence 
or absence of growth regulator (with two levels + and -). It is appropriate to replace the 
term Trt by a crossed structure Variety*GR that partitions the treatment effects into the 
main effect of variety, the main effect of growth regulator and their interaction (see 
Section 8.2). For a full analysis of this data set, we therefore repeat the procedure of 
Example 15.4A, but making this replacement throughout. First, we check whether the 
covariate is related to the treatments, using model 

Response variate: Damage 

Explanatory component: [1] + Van ety*G R 

The main effects are not significant with (by numerical coincidence) both Ff)|)) = 
1.865 and Fi^ = 1.865 (both P = 0.187), with the interaction of borderline significance 
(Fi) 2 ^o'^ = 4.197, P - 0.054). In combination with the exploratory graphs in Figure 15.12, 
the reasoning given in Example 15.4A still stands, and we proceed with fitting the sepa- 
rate lines model for grain weight, specified as 

Response variate: TGW 

Explanatory component: [1] + Damage + Variety + GR + Variety.GR 

+ Damage.Variety + Damage.GR + Damage.Variety.GR 

The covariate is fitted first, followed by terms associated with the treatment groups, 
with terms combining the covariate and the treatment groups at the end. Table 15.18 
shows the observed significance levels of marginal F-tests from a sequence of models 
for this data, with this model labelled as Model 1. 

All other terms are marginal to Damage.Variety.GR, so as a first step we can test 
only this term. We find it non-significant (Fi°i 6 °'^ = 0.341, P = 0.567) and so drop it 
and refit to obtain Model 2 of Table 15.18. In this ANCOVA setting, we are first inter- 
ested in whether we can reduce to a parallel lines model, so we next examine the 
terms Damage.Variety and Damage. GR. We can omit term Damage. GR = 2.163, 

P = 0.160) to obtain Model 3, but cannot then omit term Damage.Variety (Ffis = 5.581, 
P = 0.030). Figure 15.12 suggests that the individual slopes for variety B are both less 
steep than those for variety N, so by using the crossed structure we can now detect 
this difference that was previously masked. We therefore retain the Damage.Variety 
term and can no longer regard this as a traditional ANCOVA. However, we can sim- 
plify the model further as term Variety.GR is eligible for testing but is not significant 
(Fifil'^ = 0.485, P = 0.495). We drop the latter and refit the model (to obtain Model 4), 
and can then test term GR, which is not significant (Fy 9 = 0.043, P = 0.839). No further 
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TABLE 15.18 



Observed Significance Level (P) for Marginal F-Tests in a Sequence of 
Models for Thousand Grain Weights (Example 15.4B) 



Term 






P 






Model 1 


Model 2 


Model 3 


Model 4 


Models 


[1] 

Damage 


— 


— 


— 


— 


— 


Variety 


— 


— 


— 


— 


— 


GR 


— 


— 


— 


0.839 




Variety. GR 


— 


0.806 


0.495 






Damage.Variety 


— 


0.015 


0.030 


0.020 


0.015 


Damage. GR 


— 


0.160 


* 


* 


* 


Damage.Variety.GR 


0.567 


* 


* 


* 


* 



Note: — = term in model but not eligible for testing, * = term omitted from model. 



simplification is possible. The predictive model is therefore a separate lines model in 
terms of explanatory variate Damage and the factor Variety, written as 

Explanatory component: [1] + Damage + Variety + Damage.Variety 

We can represent this predictive model in mathematical form, using first-level-zero 
parameterization, as 



y, (Damage) = (di + v,)+ ((3i + c\,)Damage , 



Here, \y, (Damage) represents the predicted TGW for the rth variety (1 = B, 2 = N) with 
grazing damage equal to Damage. Parameter ai = 3.744 (SE 0.1007) represents the inter- 
cept for variety B and V 2 = 1.051 (SE 0.1345) is the difference in intercept for variety N 
(recall Vi = 0). Similarly, parameter Pi = -0.0125 (SE 0.00344) represents the slope for 
variety B, and TI 2 = -0.0108 (SE 0.00403) is the difference in slope for variety N, with 
f|i = 0. The predictive model for the two varieties is shown in Figure 15.13 and can be 
written as 



Variety B: y^(Damage) = 3.744 - Q.QllSDamage 

VarietyN: \iN(Damage) = 4.795 - 0.0233Damage 



We can conclude that there appears to be no effect of growth regulator on TGW, but 
that TGW is decreased by pigeon damage, and that there is a strong varietal difference 
which is also affected by the amount of pigeon damage. Although variety N always 
had a larger TGW than variety B in this experiment, it is also more affected by pigeon 
damage. For a 10% increase in plot damage, TGW is reduced by 0.125 units for variety 
B, but by 0.233 units for variety N. This analysis suggests that the ANCOVA analysis of 
Example 15.4A missed some of the nuances in the results by ignoring the structure of 
the treatment groups. 
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FIGURE 15.13 

Thousand grain weight (TGW) with separate lines model ( — ) for pigeon damage (% plot grazed) with variet- 
ies B (•) and N (■) (Example 15.4B). 



15.5.2 Evaluating the Response: Predictions from the Fitted Model 

In Section 11.2.5, we described the process of prediction for models containing explana- 
tory factors. All of the considerations discussed there also apply here, but now we must 
also incorporate explanatory variates into the process. We start with the predictive model. 
When all of the explanatory variables are factors, we form predicted values for each com- 
bination of the factor levels, and can then take marginal means (with care, as described in 
Section 11.2.5) to summarize differences between groups. When there are also variates in 
the model, the considerations are slightly different. It can be helpful to just present the fitted 
line for each group either in mathematical form (as at the end of Example 15.4B) or graphi- 
cally with confidence intervals. Questions about differences in slope between groups are 
often best answered by directly testing these differences. Comparisons between groups 
can be more difficult. In a parallel lines model, comparisons between groups will be the 
same for any fixed value of the explanatory variates, and usually predictions are made at 
the sample mean of the variates so that the predictions are readily interpreted. In a separate 
lines model, differences between groups change according to the covariate value. Unless 
prediction is required at specific variate values, summary comparisons across groups are 
unlikely to be helpful and may be misleading. In either type of model, if the explanatory 
variates are distributed differently between groups, then it may be misleading to compare 
groups at a common value of the explanatory variates, as this may create predictions for 
combinations of variables that would never normally occur. 

When making predictions from more complex models, you should therefore be careful 
to check that your predictions are meaningful and that any comparisons are interpretable. 



15.6 The Connection between Factors and Variates 

Up to this point, we have considered factors and variates as intrinsically different types of 
explanatory variables. In fact, a factor can be considered as a set of covariates with a particular 
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structure, and in this section, we explain the connection in some detail. For simplicity, we 
start with the separate groups model of Section 15.3 written in mathematical form as 

y,vt = ai + V; + e,vf (15.5) 

Wriffen in fhis form, we have several models; one for each group. To rewrife fhis as a single 
model in ferms of fhe full sef of paramefers, we define a sef of dummy variates. For a fac- 
tor with t groups, we construct t dummy variates, labelled by subscript 1 = 1 ... t. The Ith 
dummy variate corresponds to the 1th group and its values are labelled by both the group, 
I, and the observation, jk, as The values of fhis dummy variafe are 1 for observafions 
belonging fo fhe Ith group, i.e. where j = I, and zero ofherwise. 



EXAMPLE 15.5A: CALCIUM POT TRIAL* 

Consider the pot trial data of Examples 3.4 and 4.1. Four relative concentrations of cal- 
cium (A = 1, B = 5, C = 10, D = 20) were each applied to five individual plants growing in 
pots arranged as a CRD. At the end of the experiment, the total root length (cm) in each 
of the 20 pots was measured. The data and the set of dummy variates associated with 
the Calcium factor are in Table 15.19 and can be found in file calcium 2 .dat. The first 
variate, labelled dy takes value 1 for observations with treatment A (j = 1), and value 
0 elsewhere. The remaining variates, dj, dg and are formed similarly in relation to 
calcium treatments B, C and D, respectively. 



TABLE 15.19 

Calcium Pot Trial Data from Table 5.6 with Additional Labelling 
in Terms of Calcium Treatment (j) and Replicate (fc) and Dummy 
Variates d^-d^ Corresponding to the Calcium Factor (Example 15.5A) 



Pot 


Calcium 


Length 


; 


k 


d, 


dg 


da 


di 


1 


D 


47 


4 


1 


0 


0 


0 


1 


2 


A 


58 


1 


1 


1 


0 


0 


0 


3 


B 


80 


2 


1 


0 


1 


0 


0 


4 


C 


49 


3 


1 


0 


0 


1 


0 


5 


D 


49 


4 


2 


0 


0 


0 


1 


6 


A 


52 


1 


2 


1 


0 


0 


0 


7 


D 


45 


4 


3 


0 


0 


0 


1 


8 


A 


74 


1 


3 


1 


0 


0 


0 


9 


C 


70 


3 


2 


0 


0 


1 


0 


10 


B 


68 


2 


2 


0 


1 


0 


0 


11 


A 


58 


1 


4 


1 


0 


0 


0 


12 


C 


72 


3 


3 


0 


0 


1 


0 


13 


A 


79 


1 


5 


1 


0 


0 


0 


14 


D 


48 


4 


4 


0 


0 


0 


1 


15 


C 


74 


3 


4 


0 


0 


1 


0 


16 


B 


72 


2 


3 


0 


1 


0 


0 


17 


D 


38 


4 


5 


0 


0 


0 


1 


18 


C 


71 


3 


5 


0 


0 


1 


0 


19 


B 


74 


2 


4 


0 


1 


0 


0 


20 


B 


85 


2 


5 


0 


1 


0 


0 
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We can now rewrite the separate groups model in terms of the dummy variates as 

yjk = oci + Vidyi, + V2d2jk + ... + Vtdtjk + Cjic . (15.6) 

Because each unit belongs to exactly one group, only one of fhe values dy^ fo dy^ is 1, wifh 
fhe values of fhe ofher dummy variafes being 0. For example, for fhe firsf observafion in 
group 2, we have ^221 = 1 and d 2 ^i = 0 for j 2. Hence 

1/21 = tti + (Vi X 0) + (V2 X 1) + . . . + (Vf X 0) + £21 
= tti + V2 + £21 / 



and so fhe model for fhe firsf observafion in group 2 (or any ofher observafion in any ofher 
group) is equivalenf fo fhe form given in Equafion 15.5. In fhe new form of Equafion 15.6, 
often called fhe dummy variate representation, the separate groups model looks like a 
MLR in terms of fhe dummy variafes 6^ ... df and can be written in symbolic form as 

Explanatory componenf: /’7J + (c/^ + . . . + d,) 

We use parenfheses () fo emphasize fhe associafions wifhin fhe sef of dummy variafes fhaf 
represenf a factor. As explained previously, fhis model is over-parameferized. In fhis form, 
fhere is no intormafion leff affer fhe firsf f - 1 dummy variafes have been fiffed, leading 
fo lasf-level-zero consfrainfs being imposed by defaulf. We prefer fo use firsf-level-zero 
consfrainfs, so we impose Vj = 0 and omif fhe firsf dummy variafe from fhe model, giving 
symbolic form 

Explanatory componenf: [1] + {d 2 + ... + 

The main difference befween a MLR and fhis model is fhaf we add all of fhe dummy vari- 
afes corresponding fo a factor (fhose in parenfheses) info fhe model as a group, fo obfain a 
combined incremenfal sum of squares for fhe term, rafher fhan adding fhe dummy vari- 
afes individually. 



EXAMPLE 15.5B: CALCIUM POT TRIAL* 

The separate groups model for the calcium pot trial data can be specified with the 
dummy variates given in Table 15.19, with first-level-zero parameterization, as 

Explanatory component: [1] + (dj + d^ + c/4) 

The model SS is 2462.95, giving the ANOVA table in Table 15.20 which matches the 
ANOVA table previously obtained (see Table 5.5). 

The dummy variate representation can be used whenever a factor appears in a model. 
So, for example, fhe separate lines model for a facfor grp wifh fhree groups (coded as 
dummy variafes d^ c/g and c/3) can be wriffen in symbolic form as eifher 

Explanatory componenf: [1] + x + grp + x.grp 

or equivalenfly, explicifly imposing firsf-level-zero consfrainfs, as 

Explanatory componenf: [1] + x + {d 2 + d^) + (x.cfg + x.d^ 
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TABLE 15.20 

Sequential ANOVA Table for the Separate Groups Model Fitted to the 
Calcium Pot Trial Data Using Dummy Variates d 2 -d^ (Example 15.5B) 



Terms Added 


Incremental 

df 


Incremental 

SS 


Mean 

Square 


Variance 

Ratio 


P 


+ (ds + da + d^) 


3 


2462.95 


820.98 


10.753 


< 0.001 


Residual 


16 


1221.60 


76.35 






Total 


19 


3684.55 









SS = sum of squares. 



Again, terms within parentheses are added into the model as a group. Terms of the 
form x.dg, composed of fwo variafes, are equivalenf fo a single variafe calculafed by 
mulfiplicafion of the values of fhe confribufing variafes together for each observafion. 
We now also have fwo ways to write our separate lines model in mathematical form. The 
following two forms are equivalenf, and bofh use firsf-level-zero consfrainfs, wifh Vj = 0 
and rii = 0: 



Vjk — (OCi + V j) + (Pi + + Cji , 

yjk = (oci + Vidijk + V2d2jk + Vsd^jk) + (Pi + riidi,7c + '(\2d2jk + B3d3jk)Xjk + Sjk ■ 

Similarly to the symbolic form, composife terms of the form d 2 ; 7 c^;Tc can be considered as a 
single value calculafed by mulfiplicafion of fhe componenf values. 



EXAMPLE 15.1K: STAND DENSITY OE MIXED NOTHOFAGUS FOREST PLOTS 

The three dummy variates (d^, dj and dj) required to represent the Type factor in the 
stand density data set are presented in Table 15.21 and can be found in file forest 2 .dat. 
Calculation of the composite terms logQD.d^ logQD.d 2 and logQD.d^ are also shown in 
Table 15.21 as the product of the variates d^, dj and d^, respectively, with logQD. 

The separate lines model for these data can be written in symbolic form either in 
terms of the Type factor as 

Explanatory component: [1] + logQD + Type + logQDJype 

or using the dummy variates and first-level-zero constraints as 

Explanatory component: [1] + logQD + (dj + d^) + logQD.{d 2 + d^) 

= [1] + logQD + (dj + dj) + {logQD. d 2 + logQD. d^ 

Fitting the model in terms of factor Type gave the ANOVA table shown in Table 15.5. 

Using the dummy variates, and fitting the terms in parenthesis together gives the same 
sequential ANOVA table, as shown in Table 15.22. The parameter estimates are the same 
as those obtained in Table 15.3 in both cases, although now labelled by the dummy vari- 
ates rather than by the factor levels. 

The interpretation of factors as a set of dummy variates allows both types of explana- 
tory variable to be considered within a single framework, which facilitates a unified 
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TABLE 15.21 



Dummy Variates (dp dj, dj) and Their Products with Explanatory Variate logQD 
(logQD.dp logQD. d 2 , logQD.d^) for Three Groups (Factor Type) of Mixed Nothofagus 
Forest Calculated for Four Plots in Each Group (Example 15.1K) 



Plot 


Type 


logQD 


d, 




di 


logQD. d., 


logQD. d 2 


logQD. d 2 


1 


Rauli 


2.57 


0 


1 


0 


0 


2.57 


0 


2 


Rauli 


2.70 


0 


1 


0 


0 


2.70 


0 


8 


Rauli 


2.98 


0 


1 


0 


0 


2.98 


0 


9 


Rauli 


2.44 


0 


1 


0 


0 


2.44 


0 


10 


Roble 


2.48 


0 


0 


1 


0 


0 


2.48 


11 


Roble 


3.12 


0 


0 


1 


0 


0 


3.12 


27 


Roble 


2.67 


0 


0 


1 


0 


0 


2.67 


28 


Roble 


2.53 


0 


0 


1 


0 


0 


2.53 


29 


Coigue 


3.10 


1 


0 


0 


3.10 


0 


0 


30 


Coigue 


3.42 


1 


0 


0 


3.42 


0 


0 


40 


Coigue 


2.65 


1 


0 


0 


2.65 


0 


0 


41 


Coigue 


2.98 


1 


0 


0 


2.98 


0 


0 



TABLE 15.22 

Sequential ANOVA Table for Separate Lines Model for Logged Stand 
Density Using Explanatory Variate logQD and Dummy Variates dj, d^ to 
Represent Factor Type (Example 15.1K) 



Term Added 


Incremental 

df 


Incremental 

SS 


Mean 

Square 


Variance 

Ratio 


P 


+ logQD 


1 


4.5833 


4.5833 


78.562 


< 0.001 


+ (d2 + ds) 


2 


1.9403 


0.9701 


16.629 


< 0.001 


+ logQD. (d 2 + dg) 


2 


0.2011 


0.1006 


1.724 


0.193 


Residual 


35 


2.0419 


0.0583 






Total 


40 


8.7667 









SS = sum of squares. 



mathematical treatment of linear models. The terms [1] and [1], which both represent a 
vector with value 1 in all units, can be considered as equivalent. This framework allows 
sfafisfical models fo be wriffen in mafrix nofafion, and we give a very brief infroducfion 
fo fhis topic below. 

15.6.1 Rewriting the Model in Matrix Notation 

So far, we have wriffen our models in terms of individual observafions. The use of dummy 
variafes wifh mafrix nofafion allows a succincf represenfafion of fhe model for fhe whole 
sef of observafions simulfaneously. Consider fhe separafe groups model of Equafion 15.6, 
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and suppose we have six observations in each of the t groups. We can write the model for 
all observations, ordering by groups and then by observations within groups as 

yn = OCi + Vidni + V2d2ii + ... + Vtdm + Cu 

yi2 ~ ^1^112 ^2^212 "t" . . . + ^ ^12 

; (15.7) 

1/(5 = OCi + Visits + V2d2(5 + • • • + Vtdfts + £(5 

1/(6 = oci + Vidi(6 + V2d2(6 + • • • + Vfdfte + Ste 

We can then abbreviate this rather lengthy form using matrix notation. A matrix is sim- 
ply a rectangular array of numbers, with rules for addition and multiplication that are 
explained in Section C.5. Our model can then be written as 

y = Xx -I- e , 

where y is a matrix with N rows and 1 column (a vector of length N) containing the obser- 
vations, X is a matrix with N rows and t -i- 1 columns (an Nx{t + 1) matrix) containing the 
known coefficients associated with the parameters, x is a matrix with t+1 rows and 1 
column (a vector of length t -i- 1) and e is a matrix with N rows and 1 column (a vector of 
length N) containing the deviations. These matrices are defined as follows: 





yn 




U 


dm 


dm 


■ dm' 




'aO 








yi2 




1 


dm 


di\2 


dm 




Vi 




£i2 


y = 


Vt5 


; x = 


1 


dits 


d2t5 


dtt5 


; X = 


V2 


; e = 


£(5 




1, 1/(6, 




.1 


dm 


1^2(6 


dttb , 




.V(, 







The rules for matrix multiplication given in Section C.5 mean that this short form expands 
to give the full mathematical model in Equation 15.7. The matrix of coefficients, X, is usu- 
ally called the design matrix and its columns correspond to the explanatory variates. In 
this example, the first column corresponds to the overall constant term, denoted in our 
symbolic form as [1], and the 2nd to (t -i- l)th columns contain the dummy variates for the t 
factor levels, denoted earlier as c/, ... d,. All of the elements in this case are therefore either 0 
or 1. For Example 15.5 A, with four treatment groups, the 2nd to 5th columns of the design 
matrix correspond to the values given in the last four columns of Table 15.19. Another 
simple example is the SLR model of Section 12.1, which takes the form 



f M ^ 

3/1 




ri 


Xi 










V2 


; X = 


1 


X2 


; x= 


Ipj 


; e = 


£2 


1/n-i 




1 


Xn-1 






£n-i 


V 3/n , 




.1 










V , 



We can use this matrix notation for models with any combination of factors and variates. 
Using this notation, parameter estimates and SEs can be written in a general form in terms 
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of matrix operations on the design matrix, X, and the response vector, y, and this notation 
is therefore widely used in textbooks on mathematical statistics. We do not go into this 
further here but more details can be found in Mead et al. (2012) or Montgomery et al. (2012). 



EXERCISES 

15.1 The biomass data (wet weights in g) for all four sites in the study described in 
Exercise 12.2 (and Exercise 13.4) are held in file allsites.dat (variate ID, factor 
Site, variates Year, WetWeight). Use regression with groups on this combined 
data set to investigate whether the trend found at Hereford is the same as for 
the other three sites. Present a summary of the results from your analysis. 

15.2 In Exercise 12.5, you fitted a SLR to the log-transformed body mass of a sample 
of moths with explanatory variate wing length and found evidence for lack of 
fit in the relationship. However, that SLR ignored the information on the spe- 
cies of each sample that was also recorded (in data file noctuid.dat). Use this 
species information to investigate whether the relationship between log-trans- 
formed body mass and wing length is consistent across species. Test for lack 
of fit in your model and compare your results with those from Exercise 12.5. 
Can you reconcile the two analyses? Specify and interpret your final predictive 
model. 

15.3 Many plant pathogens are dispersed through the crop by rain splash. To inves- 
tigate the likely distance of travel, water drops of different sizes (weights) 
were dropped from various heights to give different velocities on impact. The 
average height of splash was measured for each combination of drop size and 
height. Eile splash.dat contains unit numbers (ID), the weight (variate Weight) 
and estimated terminal velocity (variate Velocity) for each run with the mean 
splash height (variate MeanHt). The aim of analysis is to predict splash height 
from drop velocity on impact. Eorm groups for the different weight classes and 
establish whether a common model across weight classes is appropriate. Would 
use of a common line lead to any erroneous conclusions? (We re-visit these data 
in Exercise 17.10.)* 

15.4 In Example 15.1, we explored models to predict the density of stands of three 
types from sample measurements of quadratic diameter, with models based on 
the log-transform of both the response and explanatory variables. We identified 
a parallel lines model as being most suitable for this data (Examples 15.1E and 
H). Using the results of Section 6.4, rewrite this parallel lines model in terms of 
the stand density and interpret the difference between stand types on this scale. 

15.5 A study was done to investigate the recovery of spring-applied fertilizer N in 
the harvested products of three arable crops (Macdonald et al., 1997): winter 
wheat, oilseed rape and potatoes. The recovery (% of applied N) was measured 
in potato tubers, rapeseed and wheat grain and here we investigate whether 
fertilizer recovery can be predicted by harvest index. The file recovery.dat 
contains sample numbers (ID), crop type (factor Crop), harvest index (vari- 
ate HIndex) and fertilizer recovery rates (variate Recovery, %) from 8 plots of 
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wheat, 4 plots of potatoes and 12 plots of oilseed rape. Plof fhe dafa and discuss 
whefher regression wifh groups is a sensible approach.* * 

15.6 As parf of a sfudy fo quanfify phosphorus (P) use efficiency in crops, dafa on 
fhe increase in Olsen P and increase in fofal P in 52 plofs in long-ferm experi- 
menfs across fhree sifes wifh differenf soil fypes were compiled (Johnston ef al., 
2001). The file p.dat confains index numbers (Plot), fhe site name (factor Site) and 
measuremenfs of increase in Olsen P (variafe IncOlsenP) and increase in fofal P 
(variate IncTotalP) for each plof. Invesfigafe whefher fhe increase in fofal P can 
predicf fhe increase in Olsen P, and whefher fhis relafionship differs befween 
soil fypes (sifes). Find fhe simplesf adequate model fo describe fhese dafa, and 
wrife down and inferpref your final predicfive model.^ 

15.7 The Julian date of fhe lasf record of fhe aphid Myzus ascalonicus (shallof aphid) 
in the Insect Survey suction trap at Rothamsted was obtained for 1968 fo 2005 
(inclusive). Years could be classified as eifher early (lasf record < 210) or lafe 
(lasf record > 280). These groupings may be linked fo fhe abundance of winged 
aphids in aufumn, wifh early years corresponding fo small (or absenf) aufumn 
migrafions. Dafa file shallot.dat holds unif numbers (ID) wifh fhe year (Year), 
date of lasf observafion (JDate) and classificafion as an early or lafe year (Group). 
Use regression wifh groups fo esfablish whefher fhere is any sfafisfical evi- 
dence fhaf fhe dafe of lasf observafion (response JDate) is changing over fime 
(explanatory variafe Year) and whefher any frend over fime differs befween 
early and lafe years (factor Group). Check for evidence of femporal correlafion. 
Wrife down your predicfive model and reporf your conclusions.! 

15.8 In Example 8.6, we analysed a designed experimenf fo invesfigafe fhe affinify 
of a sugar fransporfer profein for a subsfrafe wifhin planf cells. We modelled 
fhe relafionship befween response logg(Km) and fhe equivalenf volfage using 
polynomial confrasfs wifhin ANOVA. The unif numbers (ID), sfrucfural facfors 
(Rep, DUnit), inpuf volfage (variafe Voltage) and response (variafe Km) are held 
in file voltage.dat (Table 8.23). Refif fhe model as a linear regression, including 
replicafes in fhe model. Is fhere any evidence of model misspecificafion? Check 
for lack of fif and verify fhaf fhis gives fhe same resulfs achieved in Example 8.6. 
Do you agree wifh fhe conclusions from our original analysis? 

15.9 The impacfs of several mefhods of forming ground cover in apple orchards were 
compared in a designed experimenf (Pearce, 1983). The sfandard mefhod (code 
O) was compared fo five fypes of permanenf crops (codes A-E). The experi- 
menf used four blocks of six frees, and freafmenfs were allocated af random fo 
frees wifhin each block (a RCBD). The frees were old, and varied in producfiv- 
ify, and so fheir fofal yield (bushels) over fhe previous 4 years was provided 
as a covariafe. The response was fofal yield (in pounds) over a 4-year period 
wifh fhe new freafmenfs. Eile apple.dat confains unif numbers (ID), fhe sfruc- 
fural facfors (Block, DPIot), freafmenf codes (factor Trt) and fhe crop from each 
free before (variafe PrevCrop) and during fhe experimenf (variafe TotalCrop). 
Invesfigafe fhe impacf of fhe freafmenfs, faking bofh fhe design and fhe covari- 



Data from A. Macdonald, Rothamsted Research. 

* Data from A.E. Johnston, Rothamsted Research, 
t Data from R. Harrington, Rothamsted Research. 
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ate into account. Is there any impact of including the covariate in the analysis? 
Which treatments would you recommend? 

15.10 An experiment investigated the oxygen consumption of wireworm larvae af 
several femperafures (Bliss, 1970, Exercise 20.2). Consumpfion was expecfed 
fo vary wifh larval size, so uniform bafches of larvae of differenf sizes were 
fesfed and fheir mean weighfs were recorded. File oxygen.dat confains unif 
numbers {Unit), fhe femperafure group for each bafch (factor Temperature), and 
fhe nafural logarifhms of mean bodyweighf (variafe logBodyWt, mg) and oxy- 
gen consumpfion per individual (variafe logConsumption, mL/h). Is fhere any 
evidence of differences in oxygen consumpfion befween femperafure groups 
after faking body weighf info accounf? Whaf do you need fo check before you 
can answer fhis quesfion? Write down a predicfive model for oxygen consump- 
fion af each femperafure, and inferpref fhe differences befween femperafures. 
(We re-visif fhese dafa in Exercise 17.8.) 



16 
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In Chapters 7 to 9, we showed how to analyse data arising from designed experiments 
using multi-stratum ANOVA to take proper account of sfrucfure in fhe experimenfal 
unifs and fhaf fhis approach led fo appropriafe esfimafes of paramefer sfandard errors. 
However, mulfi-sfrafum ANOVA does nof apply fo unbalanced sfrucfures, and so ifs use 
is limifed. In Chapfer 11, we saw fhaf combining fhe explanafory and sfrucfural compo- 
nenfs of fhe model - fhe so-called infra-block analysis - gives good resulfs only for cerfain 
fypes of sfrucfure. We fherefore need a more general approach fo accounf for sfrucfure, 
and in fhis chapfer, we infroduce fhe class of linear mixed models. This class exfends 
mulfi-sfrafum ANOVA fo fhe cases of unbalanced and non-orfhogonal sfrucfures, and 
exfends regression models fo include a sfrucfural componenf. We sfarf wifh a shorf dis- 
cussion of fhe need fo include sfrucfure in models (Secfion 16.1) and fhen give a more 
formal definifion of fhe linear mixed models fhaf we use fo achieve fhis (Secfion 16.2). We 
fhen describe mefhods for invesfigafing fhe explanafory componenf of fhe model (Secfion 
16.3) and aspecfs of fhe sfrucfural componenf (Secfions 16.4 and 16.5) before considering 
predicfion (Secfion 16.6) and model checking (Secfion 16.7). We analyse a dafa sef in some 
defail fo illusfrafe fhe concepfs discussed in fhe previous secfions (Secfion 16.8), and we 
explain some of fhe difficulfies fhaf can be encounfered wifh fhis more general form of 
model (Secfion 16.9). Finally, we give a general overview of exfensions fo fhis class of 
models (Secfion 16.10). 



16.1 Incorporating Structure 

In Chapters 7 and 9, we analysed experimental studies with structure such as hierarchical 
blocking and pseudo-replication. We specified models using two separate components: 
the explanatory component was used to describe the relationship between the explanatory 
variables and the response, and the structural component was used to describe structure 
present in the observations. We argued that incorporation of the structure is required to 
generate the correct parameter SEs and df for hypothesis testing, and we achieved this 
with multi-stratum ANOVA. 

In observational studies, structure is also often present and should be accounted for. For 
example, consider a large-scale ecological survey taken across fields growing several types 
of crop within designated farms in a region. The farms are not of interest in themselves, as 
they are intended to provide a representative sample, but systematic differences between 
farms are expected as a result of local management practices and so farms are regarded as 
a structural factor. Some explanatory variables might apply to whole farms, for example, 
type of farm, while others might be measured on individual fields, for example, crop (qual- 
itative) or field area (quantitative). Incorporating the structure of the observations (in this 
case, Farm/Field) ensures that explanatory terms are compared to background variation 
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in the correct level or stratum. Similarly, within the same study, several samples might 
be taken within each field to avoid bias due to small-scale variation. In the terminology 
of Secfion 3.1.1, fhe wifhin-field samples are pseudo-replicafes. Separafion of wifhin- and 
befween-field variafion (using fhe sfrucfural componenf Farm/Field/Sample) is required 
fo assess accurafely fhe precision of esfimafes for fhe effecfs of differenf crops and fhe 
relafionship wifh field area. 

Some sfafisfical packages include algorifhms for mulfi-sfrafum ANOVA which allow 
specificafion of bofh fhe explanafory and sfrucfural componenfs of fhe model. However, 
algorifhms for mulfi-sfrafum ANOVA require an orfhogonal sfrucfure and balanced allo- 
cafion of freafmenfs (see Chapfer 11) which are rarely presenf in observafional sfudies and 
somefimes nof in more complex designed sfudies. In Chapfer 11, we demonsfrafed fhe 
principles of combining fhe fwo componenfs of fhe model (fhe infra-block analysis), and 
showed fhaf fhis approach is appropriafe when mosf of fhe freafmenf informafion occurs 
af fhe lowesf level of fhe sfrucfure, buf can be problemafic ofherwise, parficularly when 
freafmenfs are applied af higher levels or when pseudo-replicafion is presenf (defails in 
Secfion 11.6.1). In fhese cases, if will offen be better fo use a linear mixed model (LMM), 
specifying fhe model using fwo componenfs, which are usually called fhe fixed and ran- 
dom models. For bofh experimenfal and observafional sfudies, if is usually reasonable fo 
allocafe fhe ferms of fhe sfrucfural componenf as random and fhe explanafory componenf 
ferms as fixed. In general, fhe choice of which ferms fo classify as fixed and which as ran- 
dom depends on fhe aims of analysis, and we discuss fhis furfher in Secfion 16.5. A major 
advanfage of LMMs is fhaf fhe sfrucfure does nof have fo be balanced, and fhe model may 
confain any mixfure of facfors and variafes. In fhe remainder of fhis chapfer, we describe 
briefly fhe analysis of LMMs, illusfrafed wifh fwo examples. 



16.2 An Introduction to Linear Mixed Models 

A LMM is defined by fhe response, a fixed model and a random model. As sfafed above, 
here we equafe fhe fixed model wifh fhe explanafory componenf, describing freafmenfs or 
condifions fhaf may affecf fhe response, and equafe fhe random model wifh fhe sfrucfural 
componenf, describing any sfrucfure presenf in fhe sfudy. 

EXAMPLE 16.1A: WEED COMPETITION EXPERIMENT 

This experiment was introduced in Example 9.5, the layout was shown in Table 9.10 and 
the data are in file competition.dat. This split-plot experiment investigated the com- 
petitive effects of weeds (factor Species), with and without irrigation (factor Irrigation), 
on grain yield of winter wheat (variate Grain), with four blocks (factor Block). Within 
each block, two irrigation regimes were applied to whole plots (factor WholePlot), each 
of which was split into four subplots (factor Subplot) in which the different weed spe- 
cies (no weeds, Am, Ga, Sm) were sown. This experiment has a nested structure with 
three strata: blocks, whole plots within blocks, and subplots within whole plots, i.e. 
Block/WholePlot/Subplot. The explanatory component was a two-way crossed struc- 
ture, i.e. [1] + lrrigation*Species. Irrigation effects were estimated within the whole-plot 
stratum and species effects and the interaction were estimated within the subplot stra- 
tum. The LMM for this design translates directly from the explanatory and structural 
components as 



Incorporating Structure 



429 



Response variable: Grain 

Fixed model: [1] + lrrigation*Species 

Random model: Block/WholePlot/Subplot 

The assumptions behind the LMM differ slighfly from those we have used previously. 
The fixed model is set up in exactly the same way as the explanatory component, usu- 
ally with first-level-zero (or last-level-zero) parameterization (Section 11.2.1). The random 
model has a new set of assumptions, however. The effects associated with each random 
term are assumed to be a set of independent samples from a Normal disfribution with 
a common variance, which is known as the variance component for that term. We have 
previously assumed that structural terms represent variation due to the physical structure 
of the experimental material or procedure: for continuous data it is then often reasonable 
to interpret these effects as samples from a Normal distribution. The model deviations 
become just one of these random terms, and the assumptions made for the deviations 
(Sections 4.1 and 12.1) also apply to each of the random terms. In addition, it is assumed 
that effects from different random terms are independent. 



EXAMPLE 16.1B: WEED COMPETITION EXPERIMENT 

The mathematical model for this split-plot experiment, with first-level-zero parameter- 
ization and a crossed treatment structure, can be written as 



Grainiji = pn + Blocki + IrrigatioUj + {Block.WholePlot)ij + SpecieSj^ 

+ {Irrigation.Species)jt + Cijic , (16.1) 



where Grain^jf. is the grain yield for the fcth weed species (fc = 1 . . . 4; 1 = no weeds, 2 = Am, 

3 = Ga, 4 = Sm) with the /th irrigation treatment (/ = 1, 2; 1 = without, 2 = with irriga- 
tion) in the ith block, for i = l ... 4. The first-level-zero constraints impose Irrigation^ = 0, 
SperieSj = 0 and {Irrigation.Species)ji^ = 0 for ) = 1 or k = l. Parameter represents the 
population mean without irrigation or weeds, Irrigation 2 represents the effect of irriga- 
tion with no weeds. Species^ {k = 2 ... 4) represents the effect of the fcth weed species 
without irrigation, and {Species. lrrigation) 2 j^ is the effect of irrigation on the fcth weed spe- 
cies relative to the effect of irrigation without weeds (see Section 11.2.1). The effects 
Blockj, i = 1 ... 4, are random block effects, assumed to be independent with common 
distribution Blocki ~ Normal(0, ai), the effects (Block.WholePlof)jj are random effects 
of whole plots within blocks, assumed to be independent with common distribution 
Block.WholePlotij ~ Normal (0, o^), and the deviations, are assumed to be indepen- 
dent with distribution ~ Normal(0, o^), as mentioned previously. The variance com- 
ponents in this model are al, al, and a^. 

The parameters of the LMM are the effects associated with the fixed terms and the vari- 
ance components associated with the random terms. The random effects have a slightly 
different status, which is discussed further in Section 16.5. There is no requirement for a 
LMM to have a balanced structure, and so estimation by least squares is not always effi- 
cient. The usual alternative, maximum likelihood estimation, gives biased estimates of fhe 
variance components and so Patterson and Thompson (1971) introduced a method called 
restricted (or residual) maximum likelihood (REML) to estimate the variance components, 
and this is the approach we take. The method estimates the variance components by 
minimizing a quantity called the restricted (or residual) log-likelihood function (for more 
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details, see Littell et al., 2006). The fixed effects are then estimated by the method of gener- 
alized least squares, conditional on the estimated values of the variance components. One 
advantage of the REML method is that it gives the same estimates of fixed effects and SEs 
as obtained from multi-stratum ANOVA when the structure is balanced and, where treat- 
ment information is divided across strata, estimates will be combined efficiently across 
strata into a single estimate. 



16.3 Selecting the Best Fixed Model 

The estimates of the fixed effects used with REML are often called BLUEs, which is an 
acronym for best linear unbiased estimates. The property of unbiasedness means that the 
expected value of the estimator is equal to the true parameter value. In this context, 'best' 
means that these estimates have minimum variance within the class of unbiased estima- 
tors, conditional on the variance components. In practice, we do not know the true values 
of the variance components and so substitute their REML estimates to obtain empirical 
BLUEs, often called eBLUEs. 

EXAMPLE 16.1C: WEED COMPETITION EXPERIMENT 

Fitting the split-plot experiment as a LMM with first-level-zero parameterization gives 
the estimates of fixed effects for terms Species and Irrigation. Species shown in Table 
16.1. The estimate of the constant was gn = 8.11 7 (SE 0.4063) and the estimated effect of 
irrigation in the absence of weeds was Irrigation^ = -0.935 (SE 0.5344). 

As described in Chapters 8 and 11, we usually wish to investigate the contribution of 
individual terms within the explanatory component (or fixed model) in explaining pat- 
terns of response. We need to take proper account of the structural component (or random 
model) and any non-orthogonality in the explanatory component (fixed model). Because 
it is not possible to construct a multi-stratum ANOVA table for a general unbalanced 
structure, in LMMs we take a slightly different approach and construct test statistics that 
account for the experimental structure. 

Because of non-orthogonality, we still need to consider both incremental and marginal 
forms of these statistics; recall that incremental statistics reflect the change in fit on sequen- 
tial addition of individual terms into the fixed model, and marginal statistics reflect the 
change on omission of individual terms from the full fixed model (see Sections 11.2 and 
14.4). Recall that in a non-orthogonal structure, there may be many different sets of incre- 
mental and marginal statistics, corresponding to different orders of adding terms into or 



TABLE 16.1 

Estimated Fixed Effects with Standard Errors (SE) for Terms Species and Irrigation. Species 
with First-Level-Zero Parameterization in the Weed Competition Experiment (Example 16.1C) 



Term 


Parameter 


II 

1 


Am (k = 2) 


Ga (k = 3) 


Sm (k = 4) 


SE 


Species 


Species^ 


0.000 


-4.632 


-1.437 


-1.522 


0.3613 


Irrigation. Species 


(Irrigatioti.Species)^ 


0.000 


0.160 


-1.670 


-0.085 


0.5109 
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dropping terms from the model (see Section 11.2). We again follow the principles of model 
selection discussed in Section 15.5.1. In particular, we respect the principle of marginality 
and add a term only if all marginal terms are already present in the model (e.g. add A.B 
in a crossed structure only if A and B are both present), and do not drop terms that are 
marginal to other terms present in the model (e.g. do not drop A or B if A.B is in the model). 
Here, we describe two types of test statistic in common usage in LMMs, Wald tests and 
approximate F-tests. 

For a model term associated with a single effect, the Wald statistic is equivalent to the 
square of the t-statistic obtained by division of the estimated effect by its estimated stan- 
dard error. When several effects are associated with a model term, the calculation is more 
complex. If the structure is orthogonal, then the Wald statistic is equivalent to the sum of 
squares for that term divided by the ResMS from the appropriate stratum. In the general 
unbalanced case, the marginal Wald statistic for a term is effectively the sum of squares 
of its estimated effects weighted by their estimated variance-covariance matrix. Under 
the null hypothesis of zero effects, on the assumption that the variance components are 
known, the Wald statistic has an approximate chi-squared distribution with df equal to 
the change in df when the term is added to the model (for an incremental test) or removed 
from the model (for a marginal test). This is a one-sided test, as estimates with either a 
large positive or negative value (with respect to their variance-covariance matrix) lead 
to a large positive value of the Wald statistic. As this distribution ignores the sampling 
variation associated with estimation of the variance components, it is analogous to the 
use of a Normal distribution rather than a t-distribution for the test for a single parameter 
estimate. We can see the impact of this approximation in the following example. 

EXAMPLE 16.1D: WEED COMPETITION EXPERIMENT 

This split-plot design is orthogonal, so there is a unique set of incremental Wald sta- 
tistics, and these are shown in the third column of Table 16.2. These statistics can be 
verified in each case to be equal to the SS for the term divided by the ResMS from the 
appropriate stratum (see the ANOVA in Table 9.12). The observed significance levels 
for the Wald statistics (column 4 in Table 16.2) are smaller than those from the multi- 
stratum ANOVA table and, although the conclusions do not change, the strength of 
the evidence from the Wald tests appears greater. However, if the assumptions for the 
deviations are true, then the variance ratios from the multi-stratum ANOVA have an 
F-distribution, giving a known baseline for comparison and indicating that the Wald 
tests are over-confident. 

Some caution is therefore required in the use of Wald tests, which tend to be too opti- 
mistic, i.e. to give false-positive results more often than would be expected. The reference 



TABLE 16.2 



Wald Statistics with Observed Significance Levels (P(Wald)) and Approximate 
F-Statistics with Estimated Denominator df (ddf) and Observed Significance Levels 
(P(F)) for the Weed Competition Experiment (Example 16.1D) 



Term 


df 


Wald 


P (Wald) 


F 


ddf 


P (F) 


+ Irrigation 


1 


9.480 


0.002 


9.480 


3.0 


0.054 


+ Species 


3 


329.178 


4.8 X 10 -^' 


109.726 


18.0 


9.3 X 10-12 


+ Irrigation. Species 


3 


16.747 


8.0 X 10 ^ 


5.582 


18.0 


0.007 
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distribution for this test is known as an asymptotic approximation, which indicates that 
it holds only for large samples. In facf, fhe phrase 'large samples' is slighfly misleading 
here, as fhe requiremenf is more specifically for fhe uncerfainfy in fhe variance-covari- 
ance mafrix of fhe esfimafes fo be small. This is difficulf fo check in a general sifuafion, 
buf in balanced sifuafions requires fhaf fhe ResDF should be large wifhin sfrafa where fhe 
fixed ferms are fesfed. 

To avoid fhis problem, various mefhods exisf fo converf Wald sfafisfics info a form fhaf 
has an approximafe F-disfribufion, wifh denominafor df fhaf quanfify uncerfainfy in 
fhe esfimafion of variances. The mosf popular mefhod was developed by Kenward and 
Roger (1997, 2009). This mefhod re-scales fhe Wald sfafisfic so fhaf if can be compared fo 
an F-disfribufion wifh numerator df equal fo fhose of fhe model ferm and an esfimafed 
denominafor df. As wifh fhe Safferfhwaife approximafion (Secfion 9.2.3), fhe esfimafed 
denominafor df will often be non-infeger. For balanced designs, F-fesfs based on fhe 
Kenward-Roger mefhod are idenfical fo F-fesfs based on fhe variance rafios. This mefhod 
is available in mosf soffware for LMMs, and fhese approximafe F-fesfs should usually be 
preferred fo fhe Wald fesfs. 



EXAMPLE 16.1E: WEED COMPETITION EXPERIMENT 

The fifth and sixth columns of Table 16.2 show the F-statistics and denominator df 
derived from the Kenward-Roger method. In this balanced case, the derived F-tests 
can be obtained by division of the Wald statistic for each model term by its df, and the 
estimated denominator df are equal to the ResDF from the appropriate stratum in the 
ANOVA table (Table 9.12). The resulting F-tests and observed significance levels (column 
7 in Table 16.2) therefore exactly match those from the multi-stratum ANOVA table. 



16.4 Interpreting the Random Model 

The variance components associated with the random terms generate a variance-covari- 
ance matrix for the observations. The variance of an observafion is equal fo fhe sum of 
fhe variance componenfs, and if can be derived from the algebraic form of the model: 
the variance of the fixed effecfs is zero, and fhe variance of each random effecf equals 
ifs variance componenf. The covariance befween any fwo observafions depends on fhe 
random effecfs held in common across fhe observafions, and if is fhe sum of fhe variance 
componenfs for fhese common random effecfs. These calculafions are illusfrafed in fhe 
following example. 

EXAMPLE 16.1F: WEED COMPETITION EXPERIMENT 

The estimated variance components for the weed competition experiment are in Table 
16.3. The block variance component is smaller than the whole-plot and subplot variance 
components, which are similar in size. 

To estimate the variance of a single observation from this experiment, we start with 
the model in Equation 16.1. The fixed terms do not contribute to the variance, and we 
have assumed that all random effects are independent (both within and across terms), 
so we do not need to account for covariances between random effects, which are all 
zero. The variance is thus equal to the sum of the variances of the random effects, which 
is the sum of the variance components, i.e. 
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TABLE 16.3 



Estimated Variance Components for the Weed Competition 
Experiment (Example 16.1F) 



Term 


Parameter 


Estimate 


SE 


Block 


Cb 


0.0893 


0.2732 


Block.WholePlot 


al 


0.3100 


0.3072 


Block. WholePlot.Subplot 




0.2610 


0.0870 



SE = estimated standard error. 



Var(Grai«,;,j-) = YaT{Blockj + Block. WholePlotij + Ciji^) 

= Vax{Blocki) + Var{Block.WholePlotij) + Var(ey*.) 

-.2 -.2 -.2 
= (Tb+ CTw+ O 

= 0.0893 + 0.3100 + 0.2610 
= 0.6604 . 

This estimated variance is the same for all observations. The estimated covariance 
between observations from different subplots within the same whole plot (and hence 
the same block) can be derived similarly. Again, only the random terms contribute to 
the covariance and as we have assumed the random effects are independent, covari- 
ances between different effects are zero: 



Cov{Grainijit,Grainiji) = Cov{Blocki + Block.WholePlotij + eij^,Blocki + Block INholePlotij + Ciji) 

= Vai{Blockj) + 'Vai{Block.WholePlotij) 

,.2 -.2 

= CTb+ Ow 

= 0.3994 . 



The estimated covariance between observations from subplots in different whole plots 
within the same block is then equal to the estimated block variance component, 0.0893, 
and the covariance between observations from subplots in different blocks is zero. The 
covariance between observations therefore increases as their proximity within the hier- 
archical structure also increases. 

Our original definition of the variance components, as variances of the random effects, 
required these variances to be positive. The interpretation of the variance structure in 
terms of variances and covariances between observations requires only that the total vari- 
ance is positive and that the variance of any linear combination of observations is also 
positive (this property is known as positive-definiteness). In general, we use random terms 
to reflect structure and we expect units with random effects in common to be more simi- 
lar than units without, and so variance components are usually expected to be positive. 
But occasionally circumstances arise when it is natural to allow variance components to 
take negative values. For example, in field experiments, blocks are laid out on areas of 
ground thought to be reasonably homogeneous with respect to fertility and other trends. 
If a mistake is made, then plots within the same block may be less alike than plots in dif- 
ferent blocks, and this can be modelled only by using a negative variance component for 
blocks. A similar effect can occur if shelves in a CE cabinet are used as blocks to account 
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for differences in lighting, but in fact a temperature gradient from the front to the back 
of the shelves has a much stronger effect. For these reasons, we prefer to allow variance 
components to be negative when required. Even if the true values of the variance compo- 
nents are positive, it is possible that they may be estimated as negative values due to sam- 
pling variability, particularly for terms with few levels. Some statistical packages always 
constrain estimates of variance components to remain positive, bounded below at zero 
(e.g. R function Imer), while others (e.g. GenStat and SAS PROC MIXED) give a choice on 
whether estimates should be constrained to remain positive or not. The default action dif- 
fers between packages, so you should always check the documentation. 

The presentation of estimated variance parameters also differs between statistical pack- 
ages: SAS PROC MIXED and GenStat present the variance components, but R function Imer 
presents the square root of the variance components (labelled as standard deviations), 
arguing that these are easier to interpret as they are on the same scale as the observations. 
Standard errors of the variance component estimates are often provided, and the estimated 
variance components are often small compared with these SEs (e.g. Table 16.3), so it might 
be natural to think of dropping these terms from the model. We advise against this course 
for two distinct reasons. Eirst, the SEs for variance components are reliable for testing only 
when there is a large amount of information contributing to the estimate; again, the SEs 
depend on an asymptotic approximation. A better approach is the use of likelihood ratio 
tests (LRTs); however, these tests are still the subject of research and their description is out- 
side the scope of this book. Second, and more importantly in our context, the random terms 
have been included to describe the structure of the observations. This structure is a property 
of the data set and is used to obtain the denominator df for approximate E-tests: the removal 
of terms means that the random model no longer serves this purpose. There are contexts in 
which it is appropriate to try to simplify a random model, but this is not the case when it 
represents the structural component. 

There is one situation in which it may be sensible to allocate part of the structural com- 
ponent as fixed rather than random terms. This situation occurs when there are few levels 
in a random term, so its variance component is poorly estimated, and when the explana- 
tory terms vary at a lower level of the structure. Eor example, a RGBD with many treat- 
ments might have only two replicate blocks. The estimate of the block variance component 
is effectively based on only two agglomerated observations (related to the two block 
effects) and is unlikely to be reliable. In this design, ah the treatment comparisons are 
made within blocks, and no information is lost by putting block as the first term into the 
explanatory component (the intra-block analysis of Section 11.6.1) or, equivalently in the 
context of a EMM, into the fixed model. 

The remainder of this section explains the relationship between the variance compo- 
nents obtained from a REML analysis and the stratum variances obtained by multi-stra- 
tum ANOVA for a balanced set of data. 

16.4.1 The Connection between the Linear Mixed Model and Multi-Stratum ANOVA 

Estimates of variance components from REML are equivalent to those from multi-stra- 
tum ANOVA when the structure is balanced. Within the ANOVA table, the variance 
components are hidden contributors to the stratum variances, which are estimated by 
the stratum ResMS. Eor a balanced nested structure, the relationship between the stra- 
tum variances and the variance components is straightforward: each stratum variance is 
constructed as a weighted sum of variance components relating to random effects from 
that stratum and from ah lower strata. The weight for each variance component is the 
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number of observational units corresponding to a single random effect from that term. 
As an example, we consider the standard split-plot structure with m blocks, each with tj^ 
whole plots, each of which in turn contains tg subplots. We denote the stratum variances 
for blocks, whole plots and subplots respectively as ^ and These stratum variances 
are related to the variance components as follows: 

= + tg ci + 1a tfi cJb , 

-I- tg ai , 

. 

The block.wholeplot.subplot stratum is the lowest level, with random effects equal to the 
model deviations. Each deviation corresponds to a single observation, and so the weight 
for the subplot variance component (a^) is 1; this holds for all strata. In the block.wholeplot 
stratum, there are contributions from the whole-plot and subplot random effects. There 
are tg observations within each whole plot, so this is the weight for the whole-plot vari- 
ance component (Ow), and again it applies to all higher strata. The block stratum contains 
contributions from the block, whole-plot and subplot random effects. Each block contains 
1 a X tg observations and so this is the weight for the block variance component (ag). 

EXAMPLE 16.1H: WEED COMPETITION EXPERIMENT 

We can relate the estimated variance components in Table 16.3 to the estimated stratum 
variances obtained in Example 9.2. As in Section 9.2.3, we denote the estimates of stra- 
tum variances provided by the Block, Block.WholePlot and Block.WholePlot. Subplot 
residual mean squares as si, sj, and respectively. To derive estimates of the stratum 
variances, we use the formula given above, i.e. 

sg = d" + 4di + 8dg = 0.2610 + (4 x 0.3100) + (8 x 0.0893) = 2.2158 , 

= d" + 4di = 0.2610 + (4 x 0.3100) = 1.5012 , 
s" = d" = 0.2610 . 

As expected, these estimates match the stratum ResMSs shown in Table 9.12. 

Erom the formulae for the stratum variances, we can deduce that whenever the stratum 
variance for a term is smaller than that for strata lower in the hierarchy, the variance com- 
ponent associated with that term must be negative. When statistical packages constrain 
variance components to remain positive, bounded below at zero, the variance component 
estimates may then not quite match those from the multi-stratum ANOVA table. Although 
the resulting differences are usually small, exact correspondence between multi-stratum 
ANOVA and REML is desirable and was one motivation for the development of the REML 
method. This is a strong argument for allowing negative estimates of variance components. 



16.5 What about Random Effects? 

We stated earlier that the parameters of the EMM are the fixed effects and the variance 
components, but we wrote down our models in terms of fixed and random effects. In this 
section, we discuss the status of the random effects. 
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The LMM can be written in two forms. The model written in terms of bofh fhe fixed 
and random effecfs, for example, Equafion 16.1, is known as the conditional form since fhe 
response is condifional on fhe random effecfs. The marginal form of fhe model is obfained 
by infegrafing over fhe populafion of random effecfs. In fhe marginal form, fhe model is 
specified in ferms of fhe expecfed value of fhe observafions, defermined by the fixed ferms 
alone, and fhe variance-covariance mafrix of fhe observafions generafed by fhe random 
ferms (as described in Secfion 16.4). Esfimafion fakes place in fhe marginal model, fhe 
paramefers of which are fhe fixed effecfs and fhe variance componenfs. However, we are 
sfill offen inferesfed in fhe values of fhe random effecfs, and so would like fo esfimafe 
fhem. This is possible only when fhe variance componenf for fhe ferm is posifive, as ran- 
dom effecfs cannof be defined wifh a negaf ive variance. Because fhe random effecfs are nof 
frue paramefers, we obfain predictors, rafher fhan esfimafes, of fheir values fhaf are called 
BLUPs, an acronym for best linear unbiased predictors. In this context, the adjective 
'unbiased' can be slightly misleading, as it means that the expected value of a predicfor 
is equal fo fhe expecfed value of fhe populafion, which is zero. Given fhe (unknown) frue 
value of a random effecf, ifs BLUP is biased fowards zero, a properfy known as shrinkage. 
The adjecfive 'besf' here means fhaf fhese predictors have minimum mean squared error 
(defined as variance plus squared bias), condifional on fhe variance componenfs. Again, 
in pracfice, we do nof know fhe frue values of the variance components and so substitute 
their REML estimates to obtain empirical BLUPs, often called eBLUPs. The property of 
minimum mean squared error is affracfive where accuracy in predicfion is more impor- 
fanf fhan unbiasedness, and is somefimes used as a jusfificafion for assigning ferms fo fhe 
random rafher fhan fhe fixed model, parficularly in fhe confexf of variefy evaluafion (see 
Smifh et al., 2005, for discussion in fhis confexf). 

All random effecfs, including fhe deviafions, are esfimafed from a REML analysis as 
eBLUPs. This is differenf from multi-strafum ANOVA, which uses leasf-squares esfimafes 
for ferms in fhe sfrucfural componenf, and so esfimafed effecfs for sfrucfural ferms and 
residuals obfained from fhe fwo procedures offen differ. 



16.6 Predicting Responses 

Predicfion from LMMs follows fhe same basic principles laid ouf in Secfions 11.2.5 
and 15.5.2, buf addifional decisions musf be made abouf the role of fhe random effecfs. 
Predicfions are based on fhe selected model: we form a fable of fiffed values from fhis 
model classified by fhe explanatory variables (factors and variafes) and fhen fake marginal 
means fo obfain fhe predictions required. In the case of LMMs, we musf decide whefher 
fo make predicfions condifional on fhe observed values of fhe random effecfs (known 
as conditional or narrow-sense predictions), or to make predictions with respect to the 
population of random effecfs (known as marginal or broad-sense predictions). All model 
terms are used to form conditional predictions, and so in this case the table of fiffed values 
is classified by all fhe explanatory variables. Por marginal predicfions, each random ferm 
confribufes ifs populafion mean value (zero) fo fhe fiffed values, and fhe fable is classified 
only by variables fhaf appear in fhe fixed model ferms. Infermediafe schemes are also pos- 
sible, where predictions are conditional with respect to some random terms and marginal 
with respect to others. These options are discussed in some detail by Welham et al. (2004) 
and McLean et al. (1991). Marginal and conditional predictions take the same value for 
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the LMMs considered here because the mean of the eBLUPs for random ferms wifh inde- 
pendenf effecfs and common variance is equal fo fhe populafion mean of zero. Marginal 
predicfions have larger SEs fhan condifional predicfions because of fhe addifional uncer- 
fainfy associafed wifh predicfing for an unknown populafion rafher fhan for an observed 
sample. Bofh fypes of predicfion give fhe same SEDs for comparisons fhaf do nof direcfly 
involve random effecfs. 

EXAMPLE 16.11: WEED COMPETITION EXPERIMENT 

Marginal predictions for the irrigation by species combinations (|i^^ for the 7 th irriga- 
tion regime with the fcth species) are formed by ignoring the structural component (i.e. 
random model) terms and forming predictions as 

Pjk = hii + Irrigation. + Species^ + {lrrigation.Species)ji^ . 



It is straightforward to verify that these predictions are equal to the treatment means 
given in Table 9.14, and they have a common SE equal to 0.4063. The multi-stratum 
ANOVA (Example 9.2) obtained a SE of 0.3778 for the same predictions, and the differ- 
ence occurs because the ANOVA SE ignores contributions from the block and whole- 
plot effects. The SEDs are equal to 0.5344 for comparisons across irrigation regimes, 
and 0.3613 for within-irrigation regime comparisons. These are the same as the SEDs 
obtained from multi-stratum ANOVA because the contributions from structural (ran- 
dom) terms cancel when taking differences. 



16.7 Checking Model Fit 

In fhis more general confexf of LMMs, model checking becomes both more important and 
more complex. Model misspecification with respect to explanatory variates can be investi- 
gated with the techniques described in Sections 13.1 and 14.6. Assumptions regarding the 
random effects are investigated with the eBLUPs for each ferm (Secfion 16.5). The eBLUPs 
for fhe deviafions are fhe equivalenf of simple residuals and can be used in the resid- 
ual plots described in Chapter 5 (see also Pigure 16.3). Construction of fhe fiffed values 
plof requires some fhought, as fiffed values can be defined eifher fo include or fo exclude 
eBLUPs associafed wifh random ferms (buf always exclude fhe residual ferm). If random 
ferms are included in fhe fiffed values, fhen shrinkage can induce correlafion befween fhe 
residuals and fhe fiffed values, so if is offen beffer fo exclude fhese ferms. Histograms and 
Normal quanfile plofs of eBLUPs can be used fo check fhe disfribufional assumpfions for 
random ferms. 

There is no generally accepted analogue of the adjusted statistic to quantify fhe 
explanafory performance of LMMs. If is generally accepfable fo sfafe whether fixed ferms 
show evidence of group differences (for factors) or linear frend (for variates), based on 
fhe oufcome of approximafe P-fesfs. One approach fo calculafing fhe percenfage variance 
accounted for by fhe fixed model is based on fhe variance of an observafion, as defined in 
Secfion 16.3. The baseline fofal variance can be calculated as fhe sum of fhe variance com- 
ponenfs when fhe consfanf ferm alone is included in fhe fixed model. This is compared 
wifh the sum of fhe variance componenfs for fhe fixed model under considerafion, and 
fhe percenfage reducfion measures fhe percenfage variance accounted for by fhe fixed 
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model. This statistic can be used to quantify the performance of differenf fixed models for 
a given sfrucfural componenf (random model). Nofe fhaf fhe AIC and SBC described in 
Secfion 14.8 cannof be used fo compare LMMs wifh differenf fixed ferms when fhe vari- 
ance paramefers have been esfimafed by REML. 



16.8 An Example 

In fhis secfion, we analyse a real sef of dafa in some defail fo draw fogefher and illusfrafe 
fhe ideas infroduced in fhe previous secfions. 

EXAMPLE 16.2: WEED ABUNDANCE 

During 2000-2003, data were collected from an extensive UK-wide field experiment, 
known as the Farm Scale Evaluations (FSEs), to determine the ecological effects of man- 
agement regimes associated with either genetically modified (GM) herbicide-resistant 
or conventional crops. For each of four crops, the FSEs were designed with whole fields 
(blocks) split into two half-fields to which the treatments (factor Treatment, conventional 
or GM regime) were applied. Further information can be found in Case Study 19.1. Here, 
we consider only spring oilseed rape and analyse the counts of total weed abundance 
(variate l/l/eeds) recorded in half-fields after the last herbicide application was made to 
the GM crop ('post-herbicide'; Heard et al., 2003). The seedbank in each half-field (vari- 
ate Seedbank) was sampled before the crops were sown to provide a measure of initial 
seed densities. The aim of analysis here is to assess the impact of the two management 
regimes on weed abundance, taking into account the initial seedbank counts. 

The trials used 62 fields during the spring seasons of 2000-2002 (factor Year, labelled 
chronologically as 1-3). The fields were located on 37 farms (factor Farm). Only one field 
per farm was used in each year, but some farms were studied in 2 or 3 years, with a dif- 
ferent field used in each year (factor Field, numbered within farms as 1, 2 and 3). Half- 
fields (factor DHalf, labelled 1-2) are labelled systematically with respect to treatment 
although treatments were originally allocated to the halves at random (see Case Study 
19.1). Two fields without seedbank counts, plus one further field where a zero seedbank 
count was regarded as suspect, were excluded from the analysis, leaving 59 fields (118 
half-field data values). The data are held in file sosr.dat and shown in Table 16.4. 

The weed and seedbank data are plotted as counts and on log-log axes in Figure 16.1. 

We should usually take a log-transform of the weed counts to accommodate variance 
heterogeneity; the log-log plot indicates that a linear relationship with seedbank counts is 
obtained if this variate is also log-transformed. These variates were therefore transformed 
to the logio scale as LogWeeds = logio(Weeds) and LogSeedbank = \og^g{Seedbank). 

We can now consider a preliminary model for the logged weed counts. The struc- 
ture is hierarchical, with half-fields nested within fields, and fields nested within farms. 
Eighteen of the 37 farms have two or three separate fields used. In terms of these factors, 
we therefore write the structural component of the model as 

Structural component: Farm/Field/DHalf 

Since there is only one measurement per half-field, the half-field effects are the model 
deviations. The explanatory component of the model must account for year and treat- 
ment effects, and a crossed model is appropriate for these terms, i.e. 



Explanatory component: [1] + Year*Treatment 
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TABLE 16.4 

Weed and Seedbank Counts from Half-Fields under Conventional (C) or Genetically 
Modified (GM) Management Regimes in the FSE Study (Example 16.2 and File sosr.dat) 

Weeds Seedbank Weeds Seedbank 



Farm 


Field 


Year 


C 


GM 


C 


GM 


Farm 


Field 


Year 


C 


GM 


C 


GM 


1 


1 


1 


195 


200 


56 


93 


17 


1 


2 


741 


780 


70 


70 


1 


2 


2 


470 


395 


154 


218 


17 


2 


3 


337 


176 


23 


60 


1 


3 


3 


432 


192 


68 


103 


18 


1 


2 


113 


56 


150 


68 


2 


1 


1 


142 


128 


71 


126 


18 


2 


3 


634 


547 


98 


139 


2 


2 


2 


1625 


180 


60 


117 


19 


1 


2 


1302 


692 


241 


271 


3 


1 


1 


121 


84 


52 


56 


20 


1 


2 


653 


492 


252 


283 


4 


1 


1 


505 


115 


156 


145 


21 


1 


2 


73 


163 


49 


55 


4 


2 


2 


234 


248 


146 


504 


22 


1 


2 


286 


154 


65 


116 


4 


3 


3 


1266 


1166 


256 


289 


22 


2 


3 


1040 


324 


158 


51 


5 


1 


2 


54 


125 


311 


73 


23 


1 


2 


487 


288 


100 


153 


5 


2 


3 


68 


406 


49 


237 


23 


2 


3 


702 


1388 


239 


543 


6 


1 


2 


104 


48 


69 


190 


24 


1 


2 


473 


225 


44 


41 


7 


1 


2 


42 


19 


20 


7 


24 


2 


3 


485 


270 


240 


178 


8 


1 


2 


255 


387 


59 


39 


25 


1 


2 


1631 


7875 


251 


384 


8 


2 


3 


101 


121 


40 


19 


25 


2 


3 


640 


587 


471 


413 


9 


1 


1 


1815 


381 


133 


128 


26 


1 


2 


358 


25 


241 


216 


9 


2 


2 


403 


461 


182 


120 


26 


2 


3 


198 


46 


50 


149 


9 


3 


3 


817 


1395 


734 


969 


27 


1 


2 


29 


292 


33 


no 


10 


1 


2 


40 


111 


126 


79 


28 


1 


3 


244 


178 


88 


29 


10 


2 


3 


203 


327 


99 


51 


29 


1 


2 


921 


178 


89 


51 


11 


1 


1 


125 


558 


60 


124 


30 


1 


1 


376 


263 


173 


563 


11 


2 


3 


66 


149 


46 


57 


30 


2 


2 


248 


55 


113 


50 


12 


1 


1 


432 


272 


26 


50 


30 


3 


3 


404 


482 


213 


340 


12 


2 


2 


636 


25 


149 


156 


31 


1 


3 


2103 


367 


394 


530 


12 


3 


3 


356 


51 


102 


55 


32 


1 


3 


354 


233 


72 


66 


13 


1 


2 


449 


56 


62 


61 


33 


1 


3 


403 


142 


136 


56 


14 


1 


2 


2620 


1743 


302 


260 


34 


1 


3 


261 


310 


25 


85 


15 


1 


2 


314 


602 


487 


152 


35 


1 


3 


2041 


1176 


389 


693 


16 


1 


3 


708 


571 


85 


167 


36 


1 


2 


171 


677 


50 


88 
















37 


1 


2 


701 


352 


162 


142 



Source: Data from M. Heard, Centre for Ecology and Hydrology. 

As a baseline, we first analyse the logged weed counts (LogWeeds) ignoring the initial 
seedbank counts. In addition, as we should ideally like to regard the initial seedbank 
counts as a covariate (see Section 15.4), we also use this model to check whether that 
covariate (LogSeedbank) is related to the explanatory terms. 

To fit this as a EMM, the structural component becomes the random model and the 
explanatory component becomes the fixed model. Table 16.5 shows the estimated vari- 
ance components for the two responses. Since the weed and seedbank counts are on 
different scales, we do not compare the values of the estimated variances for the two 
responses, but we do compare the pattern of relative sizes across strata. 

In both cases, all three variance components are positive. This indicates some similarity 
across fields within farms, and across halves of the same field. This is expected, as weed 
management practices will differ between farms, and weed infestation often varies across 
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(a) 




FIGURE 16.1 

Weed counts for (•) conventional and (o) GM management regimes plotted against initial seedbank counts with 
both variables (a) untransformed and (b) transformed to logarithms (Example 16.2). 



TABLE 16.5 



Estimated Variance Components for LogiQ-Transformed 
Weed and Seedbank Counts with Fixed Model 
[1] + Year*Treatment (Example 16.2) 



Random Term 


LogWeeds 


LogSeedbank 


Farm 


0.0517 


0.0521 


Farm. Field 


0.0628 


0.0539 


Farm. Field. DHalf (deviations) 


0.1139 


0.0443 



fields within farms. For LogWeeds, the variation between half-fields is about twice that 
generated by farm and field effects, which are of similar sizes. For LogSeedbank, the 
three variance components are of similar sizes. This suggests that variation between half- 
fields is relatively smaller for the initial seedbank counts. Again, this might be expected 
where fields have been managed as single entities prior to the trial. 

Table 16.6 shows the approximate F-tests for the fixed terms in these preliminary 
models. The Year and Treatment factors are orthogonal, so there is a unique table of 
incremental tests that can be used to investigate fixed terms. None of the fixed terms is 
significant for LogSeedbank. This matches our prior expectations as seedbanks were 
assessed before sowing and, given the relative consistency of seedbank counts within 



TABLE 16.6 



Incremental F-Statistics with Denominator df (ddf) and Observed Significance Level 
(P) for Logifl-Transformed Weed (LogWeeds) and Seedbank (LogSeedbank) Counts 
with Fixed Model [1] + Year*Treatment (Example 16.2) 



Term 


df 


LogWeeds 




LogSeedbank 




F-Statistic 


ddf 


P 


F-Statistic 


ddf 


P 


+ Year 


2 


1.068 


34.7 


0.355 


0.728 


32.5 


0.491 


+ Treatment 


1 


5.415 


56.0 


0.024 


1.413 


56.0 


0.240 


+ Year.Treatment 


2 


0.048 


56.0 


0.953 


1.576 


56.0 


0.216 
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whole fields, the possibility of an 'unlucky' random allocation is small. This analysis 
also confirms that there was no consistent difference in initial seedbank count across 
years or across treatments within years. For LogWeeds, the Year.Treatment interac- 
tion and Year terms are not significant, indicating no differences across years, but the 
Treatment term is significant, and indicates a consistent difference between the two 
management regimes, with the conventional treatment 0.145 units (SE 0.0719) larger 
than the GM treatment on the log^ scale. 

To try to understand the denominator df used by the approximate F-tests, we can 
construct a dummy multi-stratum ANOVA table, as in Table 16.7. The structural com- 
ponent generates strata for farms (Farm), fields within farms (Farm. Field) and half- 
fields within fields (Farm. Field. DHalf). There are 37 farms, so the Farm stratum has a 
total of 36 df. There are 12 farms with two fields and five farms with three fields used, 
giving 12 + (5 X 2) = 22 df in total for the Farm. Field stratum. Finally, since each of 
the 59 fields has observations made on both halves, there are 59 df in total in the 
Farm. Field. DHalf stratum. Since treatments are applied to half-fields, the Treatment 
and Year.Treatment effects are estimated entirely within the Farm. Field. DHalf stra- 
tum, which removes three df (one for Treatment and two for Year.Treatment) and 
leaves 56 ResDF. This is the denominator df used by the F-tests for the Treatment and 
Year.Treatment terms, as we should expect. Although each farm uses a different field 
in each year, only five farms are present in all 3 years, and so effects for the Year term 
are estimated partly within and partly across farms, and the denominator df for this 
term is derived from both the Farm and Farm. Field strata. These denominator df then 
depend on both the allocation of information (which is the same for both responses) 
and on the relative values of the Farm and Farm. Field variance components (which dif- 
fer), and so the denominator df for Year differ slightly for the two responses (estimated 
as 34.7 for LogWeeds and 32.5 for LogSeedbank). 

Given the linear relationship between LogWeeds and LogSeedbank apparent in 
Figure 16.1, we might be able to improve our estimate of treatment effects by adjusting 
for the initial seedbank counts. Since most of the variation in seedbank counts occurred 
between rather than within fields, we expect that accounting for the initial seedbank 
will not have much impact on the estimated treatment effect (which is estimated from 
within-field comparisons), but we hope that accounting for this variation might increase 



TABLE 16.7 

Dummy Multi-Stratum ANOVA Table for the FSE Study with 
59 Eields (Eactor Field) on 37 Farms (Factor Farm) over 3 Years 
(Factor Year), and Two Treatments (Factor Treatment) Applied 
to Half-Fields within Fields (Factor DHalf) (Example 16.2) 



Term 


df 


Farm stratum 


Year 


2 


Residual 


34 


Farm. Field stratum 


Year 


2 


Residual 


20 


Farm. Field. DHalf stratum 


Treatment 


1 


Year.Treatment 


2 


Residual 


56 


Total 


117 



442 



Statistical Methods in Biology 



the precision of the estimate (i.e. decrease its SE). We first check whether the relation- 
ship with the logged seedbank count is the same for both treatments (and across years) 
by fitting a model with separate slopes (see Section 15.4), leading to the fixed terms in 
a LMM as 

Explanatory component: [1] + LogSeedbank*Year*Jreatment 

The terms in this model are not orthogonal, and so there are many sets of incremental 
Wald tests. Here, we use marginal E-tests to select the predictive model. In the full 
model, we can test only the three-way term LogSeedbank.Year.Treatment and we find 
that it is not significant (Model 1 in Table 16.8, Fi;' 6 a 3 = 0.262, P = 0.770). We therefore 
drop this term and refit with all the remaining fixed and random terms. We can now 
test all of the terms containing two variables with the marginal E-tests shown for Model 
2 in Table 16.8. None of these terms appears significant, so we drop the least significant 
of them first (LogSeedbankYear with = 0.075, P = 0.927), refit the model, and find 
we can then drop each of the other two-variable terms in turn (Year.Treatment then 
/.ogSeedbank.Treatment). The relationship with initial seedbank counts is therefore 
consistent across both treatments and years, and treatment differences also appear con- 
sistent across years. This leaves only the single-variable terms in the model (Model 3 in 
Table 16.8). Marginal F-tests show no evidence of consistent differences between years 
(F 2 ^ 43.5 = 0.698, P = 0.503) and so the term Year can be dropped, but the remaining terms 
LogSeedbank and Treatment both have significant marginal F-tests, and form the pre- 
dictive model. The full set of parameter estimates from this predictive model is shown 
in Table 16.9. 

This predictive model can be written in algebraic form with first-level-zero param- 
eterization as 

yijk{PogSeedbank) = jlj-t Parmt + Farm.Fieldij + Treatnientk + ^LogSeedbank . 

Here, \iijk{LogSeedbank) is the predicted logm-transformed weed count for a given value 
of the logjQ-transformed seedbank count (LogSeedbank) with the fcth treatment (k = 1, 2; 
1 = C, 2 = GM) in the 7 th field within the ith farm, for i = l ... 37, ; = 1, 2, 3. We repre- 
sent eBLUPs with a tilde (~) rather than a hat (") embellishment, which we reserve for 
estimates of the model parameters. The intercept, |li = 1.297(SE 0.2205), represents the 
prediction for the conventional (C) treatment for a zero value of LogSeedbank, which is 

TABLE 16.8 



Marginal F-Tests from Explanatory Models for Logig-Transformed Weed 
Counts with Observed Significance Level (P) (Example 16.2) 



Term 


Model 1 




Model 2 


Model 3 


[ 1 ] 

L 


— 




— 


F/;85.3 = 31.797 (P < 0.001) 


Y 


— 




— 


F 243.5 = 0.698 (P = 0.503) 


T 


— 




— 


FiVs = 7.969 (P = 0.007) 


Y.T 


— 


-12,61.0 


= 0.103 (P = 0.902) 


sf 


LY 


— 


pL.Y 

-^2,75.5 


= 0.075 (P = 0.927) 


sf 


LT 


— 


-11,63.7 


= 0.081 (P = 0.777) 


sf 


LY.T 


= 0.262 (P = 0.770) 




* 


* 



Variable names: L = LogSeedbank, Y = Year, T = Treatment. — = term in model but not 
eligible for testing, * = term omitted from model. 
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TABLE 16.9 

Parameter Estimates with Standard Error (SE) from the Final Linear 
Mixed Model for Logip-Transformed Weed Counts (Example 16.2) 



Parameter 


Estimate 


SE 


Farm variance component 


0.004 


0.023 


Farm. Field variance component 


0.049 


0.031 


Residual variance 


0.109 


0.021 


Constant 


1.297 


0.221 


Treatment effect (C - GM) 


0.173 


0.061 


Coefficient for LogSeedbank 


0.612 


0.106 



outside of the observed range and so is an extrapolation. The estimate Treatment! = 0.173 
(SE 0.0611) indicates that the intercept for fhe C treafment is 0.173 units larger than that 
for fhe GM treatment. This difference is a little larger than that found in the initial analy- 
sis, with a smaller SE. In this parallel lines model, this difference between the treatments 
is the same for any value of LogSeedbank. The slope of the linear relationship between 
LogWeeds and LogSeedbank has estimate (3 = 0.612(SE 0.1055). We conclude that the num- 
ber of weeds increases as the initial seedbank increases, and that the GM management 
system reduces the number of weeds. The predicted responses (omitting random effects) 
are shown with 95% CIs in Figure 16.2, together with the observations adjusted for Farm 
and Farm. Field eBLUPs. It is clear that the model follows the observed pattern reason- 
ably well. 

A composite set of residual plofs is shown in Figure 16.3, with fitted values calculated 
excluding the random effects Farm and Farm. Field. The distribution of fhe residuals 
appears a little skewed to the left, and the fitted values plot shows a few large negative 
residuals for fitted values around 2.50; these can also be seen in Figure 16.2. These resid- 
uals correspond to half-fields with much smaller weed counts than would be expected 
from their initial seedbank counts. Of the four most negative residuals, two come from 
each treatment group. We suspect that these discrepancies are caused by patchiness of 
the weed populations. 

The random farm effecfs have estimafed variance of 0.0040, and fhe random field 
within farm effects have estimated variance 0.0494. The reduction in the estimated 




FIGURE 16.2 

Fitted model (solid line) with 95% Cl (dashed curved lines) for logjo(Weed count) in terms of logjplSeedbank 
count) for (a) conventional or (b) GM management, with observations (•) adjusted for farm and field effects 
(Example 16.2). 
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Residual 




Normal quantile 



FIGURE 16.3 

Set of composite residual plots from the predictive model for logio(weed count) (Example 16.2). 



Farm variance component compared with the analysis without the covariate (Table 
16.5) suggests that the initial seedbank accounts for most of the overall farm differ- 
ences in weed count, which may in turn reflect differing farm management practices. 
To calculate the percentage variance accounted for by the final model, we fit a baseline 
model with random terms Farm/Field/DHalf, as mentioned previously, and the fixed 
term [1]. The estimated variance components are 0.0542 (Farm), 0.0589 (Farm. Field) and 
0.1187 (Farm. Field. DFIalf), giving total variance equal to 0.2318. From Table 16.9, the 
sum of variance components under the final model is equal to 0.1628. The percentage 
variance accounted for by the fixed terms is therefore calculated as 100 x (0.2318 - 0.16 
28)70.2328 = 30%. 



16.9 Some Pitfalls and Dangers 

Because LMMs are a more general class of model, allowing multiple random terms with- 
out any requirement for balance, the iterative algorithms used for estimation are also more 
general, with more possibility of failure. If such failures occur, there are several possible 
causes that should be considered. 
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The first possibility is that some of the variance components are not estimable. If the 
same, or equivalent, terms are put into both the fixed and random models, then no infor- 
mation is available for estimation of the variance component corresponding to the random 
term. Occurrence of this problem is not always obvious, as it is often possible to generate 
equivalent terms from combinations of different factors. A similar problem occurs if, in 
the terminology of balanced designs, there are zero residual df within any stratum, as the 
corresponding variance component then cannot be estimated. This usually indicates a lack 
of real replication for some combination of explanatory factors in the study. In either case, 
removal of the offending random term from the model also removes the corresponding 
stratum from the structure and the remaining variance components should be estima- 
ble. However, any explanatory terms that should be tested in that stratum will instead be 
tested in a lower stratum, and this should be reported as part of the analysis. 

Problems may also occur for variance components that are estimable, but estimated as 
a negative value. As discussed in Section 16.4, some algorithms permit only positive esti- 
mates that are bounded below by zero, whereas others permit negative estimates; unfor- 
tunately both approaches have problems associated with them. Variance components that 
are fixed at a lower bound of zero are ignored by the Kenward-Roger method. The impact 
on approximate F-tests (Section 16.3) for explanatory variables tested at that level of the 
structure is equivalent to dropping the random term from the model, so that the resulting 
denominator df corresponds to a lower stratum. In some implementations, variance com- 
ponents are internally parameterized on a log scale which forces them to remain positive. 
This can lead to an apparent failure of convergence where the estimate should be zero (or 
negative), as zero estimates can never be reached on the log scale. This situation is easily 
detected if monitoring of the iterated estimates is examined. Finally, if negative estimates 
of variance components are allowed, occasional instability of the algorithm may result. 
The causes of this instability are usually due to difficulties in imposing positive definite 
constraints on the variance-covariance matrix as a whole during the estimation process. 
However, where negative estimates can be obtained, they can be used properly within the 
Kenward-Roger method. 

Finally, we re-emphasize the role of the estimated variance components in the eBLUEs 
and eBLUPs. These estimates and predictors are usually treated as if the variance compo- 
nents were known, whereas in fact they are not. Uncertainty in the variance components 
leads to additional uncertainty in the eBLUEs and eBLUPs that is not accounted for in their 
SEs. An alternative approach that does account for this uncertainty is the use of Bayesian 
mixed models. One feature of this approach is the requirement for prior information (i.e. 
distributions) on the fixed effects and variance components; while uninformative priors 
for fixed effects are well established, the natural scale for priors on variance components 
is less clear (Gelman, 2006). 



16.10 Extending the Model 

In this chapter, we have briefly introduced some aspects of LMMs. This class of models 
can be applied in many different situations, but its flexibility can also be a weakness: it 
can be easy to fit an inappropriate model and to obtain misleading results. It is therefore 
vital to assess the model and results critically before proceeding to interpret them. Galwey 
(2006) and Littell et al. (2006) provide good introductions to LMMs. 
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We have discussed only the subset of LMMs in which the random effects for each ferm 
are independenf wifh common variance; fhese are offen called variance componenf mod- 
els. In addifion, we have insisfed fhaf fhe random model corresponds fo fhe sfrucfural 
componenf of fhe model, wifh fhe fixed model corresponding fo fhe explanafory ferms. 
This approach can be generalized in several differenf ways, which we consider in furn. 

There is rafher more flexibilify in fhe allocafion of ferms as fixed or random fhan our 
recipe of explanafory = fixed, sfrucfural = random acknowledges. There are several dif- 
ferenf grounds for assigning a ferm fo fhe random model. Firsf, if we believe fhaf a sef of 
random effecfs fruly represenfs a sample from a (Normal) populafion fhen if is nafural fo 
assign fhem as random. This offen applies fo sfrucfure wifhin experimenfs: fhose posi- 
fions or locafions or subsamples used are nafurally regarded as a sample of fhe wider 
populafion fhaf mighf have been used. In fhe confexf of field experimenfs repeafed over 
several years, fhis reasoning offen leads fo freafmenf x year inferacfions being regarded 
as random. Second, if fhe aim of fhe experimenf is fo model variafion across factor levels 
explicifly, fhen if is again nafural fo assign fhe facfor as a random ferm. Finally, in Secfion 
16.5, we noted fhe minimum mean squared error properfy of BLUPs, and remarked fhaf 
fhis can be used as a mofivafion for fiffing variefy effecfs as random rafher fhan fixed 
where fhe aim is accurate predicfion of relafive variefy performance across a sef of frials. 
This principle can be applied more widely, and again leads fo explanafory ferms being 
assigned fo fhe random model. If is imporfanf fo remember fhaf adding ferms info fhe ran- 
dom model changes fhe variance-covariance sfrucfure applied fo fhe observafions, and so 
may have an impacf on SEs for ofher explanafory ferms, and on fhe esfimafed denomina- 
tor df for approximafe F-fesfs. 

Variance componenf models can be generalized if we allow more general variance-cova- 
riance models on fhe random effecfs, or on fhe deviafions, and fhis leads fo correlated error 
models. These models are widely used in fhe analysis of longifudinal dafa (repeafed mea- 
suremenfs), as fhey can model fhe correlafion befween successive measuremenfs made on 
fhe same subjecf or unif, as well as allowing for changes in variance over fime. A defailed 
review of fhis area is given by Verbeke and Molenberghs (2000). This type of model can 
also be used fo accounf for spafial correlafion in eifher experimenfal or observafional sef- 
fings, for example small-scale smoofh frend across a field or glasshouse bench. Gilmour 
ef al. (1997) give some defailed examples in fhe confexf of field experimenfs, buf fhe same 
principles apply more widely. As fhese models become more complex, fhe dangers of mis- 
specificafion and algorifhmic problems also increase, so addifional care and fhoughf is 
required. 

If is also possible fo use smoofhing splines, or penalized splines, fo model non-linear 
responses wifhin fhe framework of LMMs. These models do nof have a pre-specified form; 
insfead, fhe fiffed response is defermined by fhe observed frend in fhe dafa. The imple- 
menfafion wifhin LMMs is facilifafed by a coincidence in fhe form of fhe equafions for 
esfimafing fhe spline for a given smoofhness and fhose for esfimafing eBLUEs and eBLUPs 
in a specific LMM. The smoofhness of fhe fiffed curve is defermined by fhe smoofhing 
parameter, which is usually esfimafed via a variance componenf wifhin fhe LMM confexf. 
You can find a good infroducfion fo fhis topic in Rupperf ef al. (2003). 

Einally, extensions have been made fo extend fhe class of LMMs fo apply fo non-Normal 
responses and non-linear models. The exacf evaluafion of fhe resfricfed log-likelihood 
funcfion used fo obfain REML esfimafes is much harder in fhese confexfs, involving com- 
plex infegrafions, and so simpler approximafe mefhods are offen used. The main draw- 
back fo fhese mefhods is fhaf fhe user mighf nof know when fhe approximafion is good 
enough fo draw firm conclusions - fhis is currenfly an area of sfafisfical research. Secfion 
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17.3 introduces the class of non-linear models, and these can be extended to include random 
effects through the class of non-linear mixed models (Pinheiro and Bales, 2000). Chapfer 18 
inf reduces fhe class of generalized linear models (GLMs) fhaf can be used fo model cerfain 
fypes of non-Normal responses, and fhis class can also be exfended fo include random 
effecfs (see e.g. Lee ef al., 2006). 



EXERCISES 

16.1 In Exercise 9.8, you analysed a (slighfly non-sfandard) splif-plof design wifh 
mulfi-sfrafum ANOVA. Converf fhe explanatory and sfrucfural componenfs for 
fhis experimenf info a linear mixed model and fif fhis model. Obfain esfimafes 
of fhe variance componenfs and verify fhaf fhese mafch fhose obfained from 
fhe mulfi-sfrafum ANOVA. Use approximafe F-fesfs fo idenfify a predicfive 
model and obfain predicfions from fhis model and verify fhaf fhe resulfs mafch 
fhose from fhe mulfi-sfrafum ANOVA. 

16.2* An experimenf was done fo esfablish condifions for infection of young brassica 
planfs wifh a foliar disease. Planfs were subjected fo differenf femperafures and 
periods of leaf wefness after exposure fo one of fwo isolates of fhe pafhogen. 
The experimenf used four CE cabinefs, wifh femperafures (5°C, 10°C, 15°C, 
20°C) allocafed fo cabinefs and combinafions of isolafe (f 5 ^e 1 or 2) and leaf 
wefness (8, 16, 24, 48 or 72 h) allocafed af random fo frays wifhin cabinefs, wifh 
fhe four planfs in each fray receiving fhe same freafmenf. (The original ran- 
domization has been losf so cabinefs and frays are labelled systematically.) The 
experimenf was done in four runs, wifh random allocation of femperafures fo 
cabinefs wifhin each run. Four freafmenf combinafions were omiffed from fhe 
full facforial sef (as pilof sfudies showed fhem fo produce no infecfion) and fem- 
perafures 15°C and 20°C were omiffed from fhe fhird and fourfh runs, respec- 
tively. The unif numbers (ID), sfrucfural factors (Run, Cabinet, Tray, Plant) and 
treatment factors (Temp, Wetness, Isolate) are held with the response (variate 
TotLesions, count of total lesions) in file lesions.dat. Write down the explana- 
tory and structural components for the design of this experiment and trans- 
late these into a linear mixed model. Fit this model (using a transformation to 
account for variance heterogeneity if required) and use approximate F-tests to 
determine a suitable predictive model. What would you recommend as optimal 
conditions for infection in future experiments? 

16.3 Exercises 11.3, 11.5, 11.6 and 11.7 comprised the intra-block analysis of a 
designed experiment by fitting block effects before treatment effects. We will 
now re-examine these experiments using mixed models to investigate when 
the intra-block analysis gives a good approximation to analysis by mixed mod- 
els. Allocate the structural model terms as random and the explanatory terms 
as fixed and repeat the analysis. Obtain predictions for the target explanatory 
variable (stated below) with SE and SED and compare these to results from 
the intra-block analysis. Can you understand and explain any differences? 
Can you identify features of the data sets that make the use of mixed models 
advantageous? 

a. Identification of economic conditions for growing peppers in a glasshouse 
(Exercise 11.3). You must select a predictive model (using approximate 
F-tests) and produce predictions from the selected terms. 
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b. Measurement of protein by NIRS (Exercise 11.5). Obtain predictions for the 
full set of accessions. 

c. Measurement of shoot growth for different pruning strategies (Exercise 11.6). 
Assess differences between the pruning strategies and produce predictions. 

d. Estimation of variety differences from a vandalized experiment (Exercise 
11.7). Obtain a set of variety predictions. 

16.4 A dose-response experiment investigated the action of three insecticidal seed 
treatments on three clones of aphid. Eight doses (including a zero dose) of each 
insecticide were applied to batches of seed, and the (average) actual dose of each 
insecticide applied was recorded. Three plants were grown from each of these 
24 treatments, and one plant with each treatment was allocated to each type of 
aphid clone. The experiment used six cages of 12 plants, with an unbalanced 
design (an alpha design) allocating the 72 treatment combinations to cages and 
plants. Adult aphids of the designated clone were introduced onto each plant 
and the number of nymphs present after 2 days was counted. The experiment 
was conducted in two runs, with each treatment combination present once in 
each run. Eile cage.dat holds the unit numbers {ID), structural factors (Run, 
Cage, Plant) and treatment factors (Clone, Treatment, FDose) with the actual 
dose (variate Dose) and number of nymphs after 2 days (variate Nymphs). 

Write down the structural and explanatory components of the model for this 
experiment in terms of the explanatory factors. Consider carefully whether dose 
should be crossed with the other factors or nested within treatment. Translate 
your model into a linear mixed model and fit it, checking the model assump- 
tions. Investigate whether the response to dose is a linear function, and whether 
this relationship differs between insecticides or clones or both. Identify and 
present a predictive model to summarize the results of this experiment.* * 

16.5 The efficacies of six insecticidal treatments against aphids on vegetable brassicas 
were compared against an untreated control in a field trial. The trial comprised 
four complete replicates of seven plots, arranged as a grid with seven rows and 
four columns. The seven treatments were allocated to plots with a balanced row- 
column design, so that each treatment occurred once in every column (replicate) 
and in four different rows. Each plot was split into two, with two crops (cauli- 
flower or savoy cabbage) allocated to the halves at random. Within each half-plot, 
10 of the central (guarded) plants were sampled 14 days after the second spray 
application, and the number of peach-potato aphids on each plant was counted. 
The unit numbers (ID), structural factors (Row, Column, Plot, Halfplot, Plant), 
treatment factors (Insecticide, Crop) and response {variate Aphids) are held in file 
ROwcoL.DAT. There were 15 missing plants in this sample. Write down the struc- 
tural and explanatory models and compare two methods of analysis: multi-stra- 
tum ANOVA using the Healy-Westmacott algorithm for estimation of missing 
values, and use of linear mixed models, with the missing observations omitted. 
Compare the results of the two analyses and discuss any differences.^ 

16.6 In Exercise 14.4, you constructed a model to predict tree total aerial biomass 
{TAB, data file slash.dat). That analysis ignored the fact that several trees were 



Data from S. Foster, Rothamsted Research. 

* Data from R. Collier, University of Warwick. 
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sampled from each inventory plot. We will now incorporate this structure into 
the model. 

a. Write down the structural component of the model for these data. Consider 
which of the explanatory variables are assessed (predominantly) at each 
level of the structure. Discuss whether you could successfully model this 
structure using an intra-block analysis. 

b. Using the structural component as random terms in a linear mixed model, 
fit all of the explanatory variables and perform backwards selection to iden- 
tify a new predictive model. Interpret your model. Is there any evidence of 
correlation among measurements on trees from the same plot? What impact 
has accounting for this structure made to your predictive model? 

16.7 In Exercise 14.6(d), you developed a MLR for the combined EXAMINE data set 
for years 1995 and 1996 (file EXAMiNE9596.DAT). Now, we use linear mixed mod- 
els to incorporate the crossed year x trap structure of this data set. 

a. Eirst, identify the level of the structure at which each explanatory variable 
shows variation, i.e. across years, across traps, or across year x trap combi- 
nations. How should this affect tests of the explanatory variables? 

b. Eit a baseline linear mixed model with random terms Year*Trap and no 
fixed terms, and take note of the estimated variance components. 

c. Add the terms you identified for the joint MLR in Exercise 14.6 into the 
fixed model. How do the estimated variance components change when 
these terms are added into the model? Use marginal approximate E-tests to 
decide whether all of the fixed terms are still required? How do these tests 
differ from those obtained in Exercise 14.6(d)? Write down your final model. 
What percentage of the variation does this final model account for? 

16.8 New insect repellent compounds require testing in the field, and this process is 
complicated by large variations in insect abundance over both space and time. 
Eile MiDGE.DAT holds the results from a trial to test a potential repellent com- 
pound against the Scottish biting midge (variables ID, Day, Run, Tent, Volunteer, 
Treatment and Total). The trial used two tents (A and B) in different parts of the 
same location, with several runs during each evening of three consecutive days. 
During each run, one volunteer was allocated to each tent with either the test 
formula or a positive (known active compound) or negative (blank) control (the 
three treatments). The number of midges entering the tent was counted from 
4-20 min after the start of the trial, and the total number was recorded. After 
an inter-run period of 20-30 min, the process was repeated with a different 
volunteer and compound in each tent. There were six volunteers available, and 
these were allocated to tents and compounds in as balanced a way as possible, 
but the resulting design is unbalanced.’ 

a. Write down the structure of this trial in terms of the Day, Run and Tent 
factors. Consider which factors are nested and which are crossed and iden- 
tify the residual term. How much information is there at each level of this 
structure? 



Data from J. Pickett, Rothamsted Research, A.J. Mordue, Aberdeen University, and J. Logan, London School of 
Hygiene and Tropical Medicine. 
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b. Use tables of counts to investigate the allocation of volunteers and treatments 
to days, tents and runs. Examine the replication of volunteer x treatment 
combinations. Is it possible to get sensible estimates for all combinations? 
What happens if you fit an effect for an unreplicated combination? 

c. Set up a linear mixed model with random term Day. Run and fixed terms 
Day*Tent + Volunteer + Treatment. Use your answers to parts (a) and (b) 
to justify this model. Fit this model and use diagnostic plots to check the 
assumptions. Refit with a transformation if necessary. When you are sat- 
isfied with the model fit, interpret the estimated variance components. Is 
there any evidence that the test treatment repels midges? Identify and inter- 
pret a predictive model for this trial. 

d. Discuss what information from these results can be used to design future 
trials of this type. What principles would you recommend future designs 
should follow? 



17 

Models for Curved Relationships 



The regression models discussed in the previous chapters fitted straight line relationships 
between a response variable and one or more explanatory variates. In many situations, 
this type of model adequafely reflecfs fhe observed paffern, buf somefimes a curved rela- 
fionship is observed, or fhere mighf be a biological or physical reason for a curved rela- 
fionship. Tiffing a sfraighf line will fhen produce an inadequafe model and an alfernafive 
approach should be soughf. In fhis chapfer, we consider some simple fechniques for fiffing 
curved relafionships in ferms of one or more explanatory variafes. 

Firsf, we describe approaches for a single explanafory variafe fhaf sfay wifhin fhe frame- 
work of linear regression (Secfion 17.1). The simplesf approach fo deal wifh curved rela- 
fionships is fransformafion of fhe explanafory variafe, so fhaf a new franstormed variafe 
is used in place of fhe original (Secfion 17.1.1). A slighfly differenf approach uses a com- 
binafion of franstormafions of fhe explanafory variafe fogefher in a MLR model fo creafe 
a curved relafionship. This is offen done wifh low-order polynomial funcfions (Secfion 
17.1.2) or frigonomefric funcfions (Secfion 17.1.3). We fhen exfend fhese approaches fo 
fhe case of fwo explanafory variafes fhaf acf fogefher (rafher fhan independenfly) on fhe 
response, so fhaf inferacfion befween fhe explanafory variafes is required fo generafe an 
appropriafe curved surface (Secfion 17.2). Finally, non-linear regression is a more sophis- 
ficafed approach fhaf allows a wider range of models fo be fiffed (Secfion 17.3). However, 
fhis approach requires a differenf sef of numerical and sfafisfical fechniques, which are 
mafhemafically and compufafionally more complex fhan fhose used in linear regression, 
and which we describe only briefly here. 



17.1 Fitting Curved Functions by Transformation 

We first consider transformations of a single explanatory variate as a means to produce 
curved relationships. A transformation of either the response or the explanatory vari- 
ate changes the shape of the relationship. However, transformation of the response also 
changes the characteristics of the deviations, such as homogeneity of variance, as dis- 
cussed in Section 6.1. For this reason, we use transformation of the response as a tool to 
find a scale that meets the underlying assumptions for the deviations, and we use trans- 
formation of the explanatory variate as a tool to manipulate the shape of the relationship 
with the response. In this section, we concentrate on the latter. 



17.1.1 Simple Transformations of an Explanatory Variate 

The aim of a simple transformation of the explanatory variate is to find a scale on which 
the relationship with the response becomes a straight line. The first step in the process is 



451 



452 



Statistical Methods in Biology 



to plot the response against the potential transformation of fhe explanatory variafe to see 
whefher a sfraighf line relafionship is plausible on fhaf scale. If if is, fhen fhe second step 
is to fif fhe model in terms of fhe fransformed explanatory variafe and to check for any 
signs of model misspecificafion (see Secfion 13.1). Alfernafive forms of fransformafion can 
be compared formally wifh goodness-of-fif sfaf isfics (such as adjusfed R^) and by graphical 
inspecfion of fhe differenf model fifs or, if sufficienf dafa are presenf, by cross-validafion 
(Secfion 14.9.3). 

The mosf common fransformafions of an explanafory variafe (x) used in fhis confexf are 
fhe square roof (Vx = x° ®), square (x^), logarifhm (logg(x) or logio(x)), exponenfial (exp(x)) and 
reciprocal (1/x) fransformafions. Typical shapes for fhese funcfions are shown in Figure 
17.1. Trigonomefric funcfions, for example, sin(x) or cos(x) or bofh, or ofher powers, for 
example, x®, can also be used, and fhese are discussed in more defail in Secfions 17.1.2 and 
17.1.3. The modelling procedure enfails calculafion of fhe fransformed variafe, for example, 
w = Vx, fhen a simple linear regression (SLR) model is fiffed wifh fhe fransformed variafe w 
as fhe explanafory variafe in fhe model. 




(c) 




(d) 





(f) 
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FIGURE 17.1 

Typical shape of response (y) for simple functions of an explanatory variate (x): (a) y = x^, (b) y = 0.25e*, (c) y = x“-^, 
(d) y = loge(x), (e) y = (f) y = 1 /x. In each case, the line shows the underlying curve and the points show a 

sample of 40 observations taken from the underlying function plus Normal deviations with common variance. 
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Some of these functions are valid only for positive values of the explanatory variate 
(x > 0, e.g. for M/ = logg(x), logio(x) or 1/x), and some are valid only for non-negative values 
(x > 0, e.g. for w = Vx). If some of the values of the explanatory variate are outside the range 
allowed, for example, if x, = 0 for a log transformation, then a pragmatic solution is to 
add a positive offset c to all the values of the explanatory variate, for example, logj,(x -i- c). 
The choice of a sensible offset value was discussed in Section 6.2 in the context of trans- 
formation of the response. Here, the aim is to find an offset such that all of the values of 
the explanatory variate fall within the allowed range and the relationship between the 
response and transformed explanatory variate is a straight line. Different values of the 
offset can have a large impact on the shape of the curve and should be evaluated both 
graphically and with goodness-of-fit statistics. 

Predictions of the fitted line, together with standard errors (SEs) and confidence inter- 
vals (CIs), can be calculated in terms of the transformed variable (w) as for SLR (Section 
12.5). These predictions, SEs and CIs also apply directly to the original explanatory variate 
(x); this is illustrated in Example 17.1. 



EXAMPLE 17.1A: OLSEN P 

The exhaustion land long-term field trial at Rothamsted Research has been used to 
investigate the relationship between crop yields and applications of soil fertilizer. The 
data in Figure 17.2a, Table 17.1 and file phosphorus.dat are yields of spring barley from 
20 plots in 1986 (variate Yield) with the available soil phosphorus content measured as 
Olsen P (variate OlsenP). 

There is no suggestion of variance heterogeneity in yield, so there is no reason to 
transform the observed response. However, there is clear curvature in the rela- 
tionship. An alternative plot of yield against the log-transformed Olsen P values 
{LogOP = logi()(0/senP), Figure 17.2b) appears to give a straight line. The transformed 
variate can therefore be used in a SLR model of the form 

Yieldi = a + P LogOP j + e, , 

where Yields is the yield, LogOP i is the logiQ-transformed value of Olsen P and e, is the 
deviation for the ith plot, i = 1 ... 20. The slope of the straight line is P, representing the 
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FIGURE 17.2 

Yield plotted against (a) Olsen P phosphorus content, (b) logio(01sen P) per plot for exhaustion land trial in 1986 
(Example 17.1A). 
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TABLE 17.1 



Yield and Olsen P Measurements from the Exhaustion Land Experiment at Rothamsted Research 
in 1986 (See Example 17.1A and File phosphorus.dat) 



Olsen P 


Yield 


Olsen P 


Yield 


Olsen P 


Yield 


Olsen P 


Yield 


6.8 


4.05 


10.9 


4.70 


3.7 


3.49 


4.8 


3.08 


5.8 


3.89 


9.5 


4.47 


2 


1.90 


12.9 


4.03 


3.2 


3.55 


6.1 


3.50 


9.2 


4.35 


13.8 


4.41 


1.5 


1.88 


6.1 


3.82 


bo 


4.41 


8.4 


4.04 


11.7 


4.30 


4.4 


4.01 


6.6 


3.96 


4.7 


3.26 



Source: Data from P. Poulton, Rothamsted Research. 



increase in yield for an increase of one unit of logio(01sen P), and a is the intercept of this 
straight line. In symbolic form, the model can be written as 

Response variable; Yield 

Explanatory component: [1] + LogOP 

This model accounts for 79.7% of the variation in the data (adjusted M = 0.797) and 
the F-test from the ANOVA shows a strong association of yield with logio(01sen P) 
(El 18 = 75.410, P < 0.001). The parameter estimates are shown in Table 17.2, giving the 
predictive model 



lI(LoyOP) = 1.674 + 2.644 LogOP . 

We can rewrite this predictive model in terms of the untransformed explanatory 
variable as 



\i{OlsenP) = 1.674 + 2.644 logio(OZsenP) , 

and the parameter SEs still apply on this scale. The intercept predicts the response when 
logio(OZseMP) = 0, corresponding to OlsenP = 1 on the original scale. This model gives an 
increase of 2.64 units in yield for one unit of increase in logio(OZsewP). Since a one unit 
increase on the logio scale is equivalent to a 10-fold increase on the original scale, this 
implies that a 10-fold increase in Olsen P would predict a 2.64 unit increase in yield. In 
practice, this model applies only across a sevenfold increase, as the Olsen P measure- 
ments range from 2 fo 14 units. 

Prediction at the mean value of Olsen P = 7.095, with logig{OlsenP) = 0.8510, can then be 
made, using the notation of Section 12.5, as 

|I(OZsewP = 7.095) = 1.674 -t (2.644 x 0.8510) = 3.923 . 



TABLE 17.2 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed Significance Levels (P) 
for the SLR for Yield with Explanatory Variate, LogOP = logig{OlsenP) (Example 17.1A) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


1.674 


0.2518 


6.646 


< 0.001 


LogOP 


p 


2.644 


0.3044 


8.684 


< 0.001 
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FIGURE 17.3 

Yield with fitted model ( — ) and 95% Cl (— ) plotted against (a) logj„(01sen P) and (b) Olsen P values 
(Example 17.1A). 

We can use the ResMS from the ANOVA (s^ = 0.1191) and the results of Section 12.5 to 
form a 95% Cl for this prediction as (3.756, 4.090). In Figure 17.3, the fitted line is plot- 
ted with 95% Cl against both the log-transformed and the original Olsen P values. On 
the original scale, the fitted straight line becomes a curve, and the characteristic shape 
of the Cl appears to change, although the width of the Cl is the same at the equivalent 
points of the two x-axes. 

Although the fitted model follows the overall trend in the data, there is a suggestion of 
model misspecification, as the fitted line lies above the observed yield at the extremes of 
the range. This is also clear in graphs of standardized residuals against the explanatory 
variate on either the transformed (Figure 17.4a) or original (Figure 17.4b) scale. Again, 
the same residuals are plotted in both parts of Figure 17.4, but the scale of the x-axis (and 
hence the trend line) has changed. 



Finding an adequate transformation can be difficult and will not always be possible. In 
some cases, several models based on different transformations of fhe explanafory variafe 
may appear plausible. The model chosen should give a good visual fif fo fhe observafions. 





FIGURE 17.4 

Residuals from fitted model with trend line ( ) plotted against (a) logio(01sen P) and (b) Olsen P values 

(Example 17.1A). 



456 



Statistical Methods in Biology 



show no evidence of misspecification, and should perform well wifh goodness-of-fif 
sfafisfics. All else being equal, fransformafions wifh a simple biological inferprefafion 
should be preferred. However, remember fhaf a biological inferprefafion is nof essenfial 
for a purely descripfive model, as long as fhe resulfs are inferprefed appropriafely. The 
more complex approaches described below should be considered if fhe relafionship cannof 
be capfured by a single fransformed explanafory variafe. 

Simple fransformafion of one or more explanafory variafes fo achieve sfraighf line 
relafionships can also be useful in fhe confexf of MLR models (Chapfer 14). Once a suif- 
able fransformafion is idenfified, models are selecfed from fhe fransformed explanafory 
variafe(s). Take care, however, as fhe correlafion befween explanafory variafes may disforf 
fhe shape of individual relafionships wifh fhe response, and fhe diagnosfics of Secfion 14.6 
can help fo defecf fhis. You can also exfend MLR models by fhe addifion of inf eracf ions 
befween explanafory variafes; fhis is discussed in Secfion 17.2. 



17.1.2 Polynomial Models 

If is offen difficulf fo find a single fransformafion of an explanafory variafe fhaf can ade- 
quafely describe a curved relafionship. Polynomial regression models use several powers 
of fhe explanafory variafe fo infroduce curvafure info a relafionship via a MLR model. 
Here, we consider only posifive infeger powers, i.e. x‘>, where q is a whole number. The 
order of a polynomial model is equal fo fhe highesf power of fhe explanafory variafe 
used. These models have fhe advanfage fhaf fhey are very flexible and can incorporafe paf- 
ferns of bofh increasing and decreasing response wifhin a model, i.e. non-monofonic func- 
fions. Their major disadvanfage is fhaf high-order polynomials can lead fo over-fitting, so 
that interpolation between observations can be unreliable. In addition, extrapolation may 
be unreliable even for low-order polynomials (see discussion lafer in fhis secfion). 

The SLR model is fhe simplesf case of a polynomial model, i.e. a sfraighf line. Higher- 
order polynomial models are obfained by fhe addifion of power fransformafions of fhe 
explanafory variafe, such as or x®, info fhe model. For example, a second-order polyno- 
mial, or quadratic model, includes the second power or square of fhe explanafory variafe, 
and fakes fhe form 



yi = a + piX; -I- P2X? -I- Ci ■ 

This model is fitted by fhe consfrucfion of a new explanafory variafe wifh values equal fo 
xf, fhen by fhe fiffing of a MLR model wifh fwo explanafory variafes: fhe original and fhe 
squared values. So, if variafe x confains fhe original values and x^ confains fhe squared 
values, fhis MLR model is written in symbolic form as 

Explanafory componenf : [1] +x + x^ 

A polynomial model of order q has p = q+l paramefers and can be wriffen as 

yi = a + piX; + \l> 2 x} + ... + P,-ixr^ + P? + e, . 

The sequenfial ANOVA fable for a polynomial model of order q sfarfs wifh fhe SLR model 
and fhen successively adds increasing powers of fhe explanafory variafe info fhe model, 
giving incremenfal sums of squares and F-fesfs (see Secfion 14.4). Each power of fhe 
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explanatory variate is associated with 1 df, and so a polynomial model of order q has q 
df. Predictions and CIs can be formed from fhe fiffed model as for any MLR (Secfion 14.5). 



EXAMPLE 17.1B: OLSEN P 

Here, we consider polynomial regression as an alternative to the logn, transformation 
tried in Example 17.1A. We fit a cubic polynomial, written in symbolic form as 

Response variable: Yield 

Explanatory component: [1] + OtsenP + (OtsenPf + {Olsen P)^ 

where the variate Yield holds the observations, OlsenP is the variate holding the Olsen 
P values and {OlsenPf and {OlsenPf represent variates holding the squared and cubed 
values of the OlsenP variate. Table 17.3 shows the sequential ANOVA table for this 
model, which accounts for 81.0% of the variation (adjusted = 0.810). 

The incremental F-tests for the first two terms (linear and quadratic, = 
62.063, F®i j = 20.559, both P < 0.001) are significant, but the test for the cubic term is not 
(Ffi6 = 1.183, P = 0.293). As the cubic term is added into the model last, its incremental 
F-test is also a marginal F-test and indicates that the fit of the model is not significantly 
worse if this term is dropped. We therefore drop this cubic term, and fit a quadratic 
model that accounts for 80.8% of the variation in the data (adjusted = 0.808), with 
all terms significant. The parameter estimates for the quadratic model are listed in 
Table 17.4. 

Figure 17.5 shows the fitted quadratic and cubic polynomial models with 95% CIs. A 
difference in the fit of the two models appears for larger values of Olsen P, where the 
yield is stable: the quadratic model starts to move downwards, whereas the cubic model 
stays level. The 95% CIs are narrower in the centre of the range of the explanatory vari- 
ate, and get much wider at the ends of the range (like the SLR models in Section 12.5). 
The CIs for the cubic model are wider than those for the quadratic model because the 
extra term reduces the ResSS only a small amount while introducing another estimated 
parameter with its associated uncertainty. 

We check residual plots for evidence of model misspecification. Figure 17.6 shows 
standardized residuals from the quadratic and cubic models plotted against the explan- 
atory variate. There is a suggestion of misspecification in both graphs, particularly at 
the smallest values of Olsen P. 



Problems of collinearify can occur in polynomial models, parficularly for higher-order 
models. For example, in Example 17.1B, large VIFs (> 100, see Secfion 14.7) are obfained when 
one fifs fhe cubic polynomial. This can be avoided by fhe use of orfhogonal polynomials 



TABLE 17.3 



Sequential ANOVA for a Cubic Polynomial Model for Yield with Explanatory Variate Olsen P 
(Example 17.1B) 



Term Added 


Incremental df 


Incremental SS 


Mean Square 


Variance Ratio 


P 


+ OlsenP 


1 


6.9157 


6.9157 


pL = 62.063 


< 0.001 


+ {OlsenPy 


1 


2.2909 


2.2909 


FQ = 20.559 


< 0.001 


+ {OlsenPy 


1 


0.1318 


0.1318 


pc = 1.183 


0.293 


Residual 


16 


1.7829 


0.1114 






Total 


19 


11.1213 


0.5853 







Note: SS = sum of squares. 
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TABLE 17.4 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed Significance Levels (P) 
for a Quadratic Polynomial Model for Yield with Explanatory Variate Olsen P (Example 17.1B) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


1.283 


0.3290 


3.900 


0.001 


OlsenP 


Pi 


0.593 


0.0964 


6.156 


< 0.001 


{OlsenPf 


p. 


-0.0279 


0.00618 


-4.510 


< 0.001 





FIGURE 17.5 

Observed yield and fitted curves ( — ) with 95% Cl ( — ) for (a) quadratic and (b) cubic polynomial models 
(Example 17.1B). 




FIGURE 17.6 

Residuals with trend line ( ) plotted against the explanatory variate for (a) quadratic and (b) cubic polynomial 

models (Example 17.1B). 



rather than simple powers of the explanatory variate (see also Section 8.7). Orthogonal 
polynomials are constructed so that the qth function is of order q and is orthogonal to all of 
the lower-order functions. Figure 8.8 showed simple powers alongside the corresponding 
set of orthogonal polynomials. Orthogonal polynomials have zero pairwise correlations 
and so produce a stable model without collinearity problems. Their major disadvantage is 
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that the estimated parameters are difficult to interpret, as the coefficients no longer relate 
to individual powers of fhe explanatory variafe. We can deal wifh fhis problem by using 
orfhogonal polynomials to esfablish fhe predicfive model, and fhen refif if wifh simple 
powers of fhe explanafory variafe to obfain inferprefable paramefer esfimafes. More infor- 
mafion on orfhogonal polynomials can be found in Bliss (1970). Alfernafively, centering fhe 
original explanafory variafe (see Secfion 12.9.1) before faking powers may reduce collinear- 
ify sufficienfly to enable a sfable model fo be fitted (see Example 17.3B). 

In general, polynomial models should be consfrucfed sequenfially by progressive addi- 
fion of ferms of higher orders, assisted by graphical inspecfion of fhe fif af each sfage and 
sequenfial ANOVA fables. As in previous chapfers, fhe aim is fo produce a parsimonious 
model, i.e. a model of fhe lowesf possible order fhaf describes adequafely fhe relafionship 
befween fhe response and explanafory variafe. The model fitting process sfarfs wifh a 
SLR model and higher-order powers of fhe explanafory variafe are successively added. 
Af each sfage, lack of fif is fesfed if replicate observafions are available (Secfion 12.8) and 
fhe fiffed model and residual plofs are examined visually for evidence of misspecifica- 
fion. If fhe fif appears inadequate, fhe nexf higher-order ferm can be added. If fhe fif 
appears good, fhen no furfher ferms need fo be added and fhe currenf model should 
be checked. The need for fhe highesf-order ferm in fhe model should be verified by fhe 
use of a marginal F-fesf from fhe sequenfial ANOVA fable. If fhis ferm is nof sfafisfically 
significanf (e.g. P > 0.05), fhen if should be omitted, and a lower-order model will suffice. 
Once a suifable order for fhe polynomial has been esfablished, all lower-order ferms are 
refained in fhe model, even if nof sfafisfically significanf. This sfrafegy follows from our 
argumenfs on marginalify (see Secfions 8.3 and 15.5): we consider any lower-order power 
(x*' wifh k<q) to be marginal fo a higher-order power (x''). This also implies fhaf we should 
fif lower powers of fhe explanafory variafe before higher powers, as described above. 
Following fhis principle also ensures fhaf fhe model can be franslafed fo ofher scales 
if required, for example, from a model in ferms of orfhogonal polynomials fo a model 
based on simple powers; an exacf franslafion may nof be possible if lower-order ferms are 
omitted. 

Polynomial models are essenfially descripfive models, as fhere is rarely a biological 
inferprefafion for models of fhis form. An advanfage of low-order polynomials is fhaf fhey 
can flexibly adapf fo follow fhe form of fhe relafionship. However, as fhere is no consfrainf 
on fhe form of fhe curve oufside fhe range of fhe explanafory variafe, one should never 
exfrapolafe wifh fhese models. 

As fhe order of fhe polynomial increases fhe residual sum of squares will decrease, 
and fhe fiffed curve will pass closer fo fhe observafions. In facf, for a dafa sef wifh k 
disfincf values of fhe explanafory variafe, a polynomial model of order k-1 will pass 
fhrough fhe mean af each value of fhe explanafory variafe. If fhe observed values of fhe 
explanafory variafe are unreplicafed, fhen fhis curve fifs each observafion exacfly. This 
perfecf fif is counferproducfive, as fhe model offen becomes unreliable for inferpolafion 
as if affempfs fo accommodafe defailed, and probably random, patterns in fhe relafion- 
ship; fhis behaviour is known as over-fiffing (see also Secfion 14.9). For example. Figure 
17.7 shows a polynomial of order 8 fiffed fo fhe yield observafions from Example 17.1. 
The fiffed model has adapfed fo be much closer fo fhe observafions fhan fhe lower-order 
polynomials (Figure 17.5), buf fhe inferpolafed model shows an unrealisfic shape, par- 
ficularly wifh respecf fo fhe sharp dips around Olsen P values of 1 and 13. This graph 
shows fhe imporfance of evaluafing complex curved models af a dense sef of explana- 
fory variafe values, as fhe full form of fhe curve (and any over-fiffing) may nof be appar- 
enf from fhe fif af fhe observed values of fhe explanafory variafe. 
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FIGURE 17.7 

Observed yield from Example 17.1 with fitted polynomial model of order 8 ( — ) and 95% Cl ( — ). 

Smoothing techniques provide an alternative to polynomial models, as they flexibly 
adapt to follow a curved relationship. These techniques constrain the roughness of fhe 
flffed curve and fhus largely avoid problems wifh over-flffing. Regression splines can 
be implemenfed direcfly as a MLR model, while smoofhing splines or locally weighfed 
regression (loess) smoofhers can be implemenfed as addifive models wifh a penalized like- 
lihood approach. These models are oufside fhe scope of fhis book, buf Rupperf ef al. (2003) 
provide a good infroducfion. 

17.1.3 Trigonometric Models for Periodic Patterns 

Trigonometric regression models are MLR models used to describe periodic cycles, and 
are often used to model observations related to yearly or daily cycles, for example, mean 
monfhly femperafure as shown in Example 17.2. The period of fhe cycles, i.e. fhe number 
of fime unifs corresponding fo a full cycle, is assumed fo be known and denofed as co. 
Trigonomefric regression models use sine and cosine fransformafions of fhe measuremenf 
fimes as explanatory variafes. Recall fhaf fhe sine and cosine funcfions are cyclic over fime 
wifh a period of 2k radians. To converf our explanafory variafe wifh period co on fo fhis 
scale, we use fhe fransformafions sin(27cf/co) and cos(27cf/co), where f is a variafe of observed 
measuremenf fimes. For example, a simple frigonomefric regression model for monfhly 
dafa fhaf exhibif yearly cycles, wifh co = 12, fakes fhe form 



where y, is fhe ifh observafion made af fime f, wifh deviafion e„ and fhe unknown model 
parameters are a, Pj and P 2 . This is a MLR model wifh fwo explanafory variafes, which 
are calculafed as fhe sine and cosine funcfions of fhe observed fimes, f,. This model can be 
converfed info a more inferprefable form as a single sine funcfion, wriffen as 




(17.1) 
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In this form, the parameters are the average response over a full cycle (a), the amplitude 
of the sine curve (equal to half of the range of the curve, y) and the phase of the curve (9). 
The phase is the lag behind the standard sine curve (which has its maximum at k/2 and 
minimum at 3n/2 radians). The amplitude, y, must always be non-negative (y ^ 0). Using 
standard results for trigonometric functions, we can expand the sine function in the equa- 
tion above to give 



y, = a + Ysin|^^^jcos(0) - Ycos|^^;|^jsin(0) -i- e,- . 

If we set Pi = Y cos(0) and P 2 = -y sin(0), then this is the model in Equation 17.1. We can 
therefore calculate the amplitude and phase of the fitted curve in terms of the original 
parameters as 



Y = VP? + Pi/ 0 = tan \-p 2 /Pi) . 

This estimate of the phase is in radians and we can convert it to the scale of measurement 
by multiplying by co/27i. Although inference and SEs for estimates of the original param- 
eters (a. Pi and P 2 ) follow directly from properties of MLR models (Chapter 14), SEs for esti- 
mates of Y and 0 are not straightforward, as these are non-linear functions of the original 
parameters. Approximate SEs can be calculated in statistical software by the delta method 
(see Casella and Berger, 2002). 

EXAMPLE 17.2: ROTHAMSTED MONTHLY MEAN TEMPERATURE 

The monthly mean temperatures at Rothamsted Experimental Station over the period 
1891-1990 are listed in Table 17.5 and can be found in file temperature.dat. For this 
response (held in variate Temperature), we expect a yearly cycle and so trigonometric 
regression is appropriate. 

The explanatory variate. Month, has values 1-12. To obtain cycles of period 12, i.e. 
equal to 1 year, we calculate explanatory variates Sin and Cos as 

Sin = sin(2nMonth/12), Cos = cos{2nMonth/l2) . 

The model can then be written in symbolic form as 

Response variable: Temperature 

Explanatory component: [1] + Sin + Cos 

TABLE 17.5 

Monthly Mean Temperatures (°C) at Rothamsted (UK) over the Period 1891-1990 (See Example 17.2 



and File temperature.dat) 


Month 


Month 


Temperature 


Month 


Month 


Temperature 


January 


1 


3.1 


July 


7 


16.0 


February 


2 


3.4 


August 


8 


15.7 


March 


3 


5.3 


September 


9 


13.5 


April 


4 


7.7 


October 


10 


9.8 


May 


5 


11.1 


November 


11 


5.9 


June 


6 


14.0 


December 


12 


4.0 


Source: Data from Rothamsted Research. 
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TABLE 17.6 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed Significance Levels (P) 
for a Trigonometric Regression for Monthly Mean Temperature at Rothamsted (Example 17.2) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


9.12 


0.139 


65.432 


< 0.001 


Sin 


Pi 


-4.09 


0.197 


-20.722 


< 0.001 


Cos 


P2 


-5.13 


0.197 


-26.007 


< 0.001 



The fitted model accounted for 99.0% of the variation in the data (adjusted = 0.990), 
and the parameter estimates are listed in Table 17.6. 

The overall mean temperature is equal to the model intercept, which is estimated as 
9.12°C. The amplitude of the fitted curve is calculated as 

Y = VPi + P' = V(-4-09)" + (-5.13)^ = 6.56 , 

hence the range of monthly temperatures is twice this value, equal to 13.12°C. Einally, 
we can consider the phase. We have 



e = tan-i(-p 2 /Pi) = tan-H5.13/(-4.09)] = tan-'(-1.255) , 

and this has two solutions, 0 = 2.244 and 0 = 5.385 in the range 0 < 0 < 2 ti. The rela- 
tionships pi = ycos(0) and P 2 = -Ysin(0) tell us that cos(0) < 0 and sin(0) > 0, and 
hence n/2 < 0 < 7t radians (i.e. 1.57 < 0 < 3.14 radians) giving the solution 0 = 2.244. 
This is then translated onto the scale of the time variate by division by 2n then multipli- 
cation by 12 to give the phase as 4.285 months. In the standard sine curve, the maximum 
and minimum occur at one-quarter and three-quarters of the period of the whole cycle, 
equivalent to three and nine months from the start of the yearly cycle here. In the fitted 
model, we therefore predict the maximum at 3 -t 4.285 = 7.285 months (between July and 
August) and the minimum at 9 + 4.285 = 13.285 months, which because of the 12 month 
cycle is equivalent to 1.285 months into the year (between January and February). These 
features of the fitted curve can be verified in Figure 17.8, where the curve is extrapolated 
back to month 0 to demonstrate its periodicity. This graph suggests slight model mis- 
specification at the extremes of the range, the fitted temperatures seem slightly too low 
at both the minimum and maximum points, but the fitted model describes the overall 
pattern well. 



One difficulty with trigonometric regression is that the observations are often collected 
as time series or repeated measurements from the same unit, for example, monthly tem- 
peratures at a single site over several years. This often gives rise to serial correlation in 
the deviations, which contradicts the assumptions of independence underlying regres- 
sion (see Section 12.1). Example 17.2 avoids this problem by using mean temperatures 
accumulated over 100 years; averaging over so many years dilutes the influence of serial 
correlations within years. Where strong serial correlation is present, methods for analysis 
of time series of longitudinal data that account for serial correlation should be used. More 
details about models for longitudinal data or repeated measurements can be found in 
Diggle et al. (2002). If the length of the cyclic period is unknown and has to be estimated, 
then this is no longer a linear model and the methods of Section 17.3 must be used. 
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Month 



FIGURE 17.8 

Mean monthly temperature (°C) at Rothamsted Experimental Station over the period 1891-1990 with fitted 
curve from trigonometric regression (Example 17.2). 



17.2 Curved Surfaces as Functions of Two or More Variates 

In Chapter 14, we considered MLR models that had several explanatory variates that acted 
independently. For example, in Example 14.1B, we modelled seed weight as a function of 
seed lengfh and hardness. In fhese models, the change in the response due to one vari- 
ate (e.g. length) is assumed to be the same regardless of the value of the other variate 
(e.g. hardness). For two explanatory variates, the resulting model can be represented as a 
plane in three-dimensional space (see Figure 14.2). This model is not always realistic, as 
the true three-dimensional surface might be curved rather than planar, which requires 
that the change in the response due to one variate depends on the value of the other vari- 
ate. For example, the change in seed weight due to a change in length might also depend 
on the seed hardness. We can model some types of curvature by including an interaction 
between explanatory variates, and this is the subject of this section. 

For simplicity, we start with the most basic MLR model based on two variates and writ- 
ten in the form 



Pi =a + pi Xii -I- P 2 Mi + , (17.2) 

where y, is fhe value of the zth observation, and X 2 , are the corresponding values of fhe 
two explanatory variates and e, is the deviation for that observation. The model parameters 
are the intercept a and the slopes, Pj and P 2 , respectively, for fhe two explanatory variates. 
We can introduce curvature into the model by including a new term, which combines the 
two variates, giving the model 

Pi = a + Pi Xii -F p 2 X2,- + P 3 (xi,- X X2i) -F e, . (17.3) 

The extra term is equivalent to a new variate calculated by multiplication of fhe values of 
fhe fwo explanatory variates for each observation (e.g. X 3 , = Xi, x X 2 ;). In the spirit of crossed 
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models (see Section 8.2), we might think of this model as containing the main effects of 
each explanatory variafe plus fheir inferacfion. If is fhe firsf explanafory variafe and Xg is 
fhe second explanafory variafe, we can write fhis model in symbolic form as 

Explanafory componenf : /’7J + x., + Xg + x^ .Xg 

The ferm x.,.X 2 represenfs a variafe holding fhe producf of fhe values of fhe fwo individual 
variafes. We can also inferpref fhis model by rewrifing if in a slighfly differenf form as 

y, = (a + P 2 X 2 i) + (Pi + p3 X 2 i)xii + Ci 

= a* + P* Xi, + e, . (17.4) 



Here, fhe model is considered as a funcfion of values of fhe firsf explanafory variafe, Xj,, 
for a specified consfanf value of fhe second variafe, Xg,. In fhis form, we can see fhaf fhis 
is like a SLR model in ferms of fhe firsf explanafory variafe where bofh fhe infercepf (here 
a* = a + P 2 X 2 ,) and slope (here P* = Pi + P 3 X 2 ,) depend on fhe value of fhe second explanafory 
variafe. A similar inferprefafion in ferms of fhe second variafe can be formed by reversal 
of fhe roles of fhe fwo explanafory variafes. 

Including fhe combined ferm wifh bofh explanafory variafes allows curvafure in fhe 
fiffed surface, buf fhis curvafure is of a specific form, so if is imporfanf to ensure fhaf fhis 
mafches fhe paffern seen in fhe dafa. We can check fhis by ploffing residuals againsf bofh 
variafes and by comparing fhe form of fhe observed and fiffed surfaces using confour or 
surface plofs. In fhe case of a designed experimenf wifh replicafion, we can formally fesf 
for lack of fif (see Secfion 12.8) by fiffing a factor version of fhe combined variafes. 

The sequenfial ANOVA fable for fhe model of Equafion 17.3 has fhree ferms: one for each 
of fhe explanafory variafes and one for fhe combined ferm, each wifh 1 df. The incremenfal 
sums of squares for fhe individual variafes depend on fhe order in which fhey are fiffed 
unless fhey are orfhogonal (see Secfion 14.4). As usual, fhe aim is to find a parsimonious 
descripfion of fhe response, so fhe simplesf possible predicfive model is sought. However, 
this process must again respect marginality, and both explanatory variates are marginal to 
the combined term. The individual explanatory variates should therefore be fiffed before 
fhe combined ferm, and should nof be dropped while fhe combined ferm is in fhe model. 

EXAMPLE 17.3A: COTTON RESPONSE TO HERBICIDE AND INSECTICIDE 

An experiment was done to evaluate the combined effects of five different doses of her- 
bicide (0, 20, 40, 60 and 80 Ib/acre) and five different doses of insecticide (0.0, 0.5, 1.0, 1.5 
and 2.0 Ib/acre) on the root growth of cotton plants in containers within a glasshouse. 

Four replicates of each treatment combination were arranged in a CRD. After three 
weeks, the dry root biomass (g/plant) was measured for each container. The treatment 
means are presented in Table 17.7 and in file cotton.dat. The residual mean square 
from the factorial model analysis of the raw data was 174 on 75 df. 

In mathematical form, denoting the ith observation of biomass (Biomass^ as a function 
of the herbicide (Herbicide^ and insecticide (Insecticide^ doses enables us to write a linear 
model with interaction (Equation 17.3) as 

BiomasSi = a + ^iHerbicidCi + (fgHsectidde, + P3 (Insecticide, x Herbicidei) + e, . 

If the variates H and / contain the herbicide and insecticide doses, respectively, this 
model can be written in the symbolic form 
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TABLE 17.7 



Dry Root Biomass (g/Plant) of 3-Week-Old Cotton Plants from an Experiment Evaluating the Effects 
of Different Amounts of Herbicide (H) and Insecticide (1) (Example 17.3 and File cotton.dat) 



H 


I 


Weight 


H 


I 


Weight 


H 


I 


Weight 


0 


0 


122.00 


1 


0 


52.00 


2 


0 


29.25 


0 


20 


82.75 


1 


20 


71.50 


2 


20 


72.00 


0 


40 


65.75 


1 


40 


79.50 


2 


40 


82.50 


0 


60 


68.00 


1 


60 


68.75 


2 


60 


68.25 


0 


80 


57.50 


1 


80 


63.00 


2 


80 


73.25 


0.5 


0 


72.50 


1.5 


0 


36.25 








0.5 


20 


84.75 


1.5 


20 


80.50 








0.5 


40 


68.75 


1.5 


40 


65.75 








0.5 


60 


70.00 


1.5 


60 


77.25 








0.5 


80 


60.75 


1.5 


80 


69.25 









Source: Data from Kuehl, R.O. 2000. Design of Experiments: Statistical Principles of Research Design and Analysis 
(2nd edition). Thomson Learning (Duxbury Press), Pacific Grove, California. 666 pp. 



Response variable; Biomass 

Explanatory component: [1] + H + I + H.l 

where H. / can be calculated as the product of the variates H and /. The sequential ANOVA 
table for this model (Table 17.8) partitions variation between the treatment means into 
that accounted for by the explanatory terms and a remainder, which can be used to test 
lack of fit (see Section 12.8). The residual is calculated from variation between replicates 
and is an estimate of pure error, uncontaminated by lack of fit, and so we choose to use 
this residual for testing model terms. 

We can first test the model as a whole by comparing the model mean square with the 
residual mean square (F 3 75 = 6.06, P < 0.001) and this model accounts for 15.8% of the 
variation in the data (adjusted = 0.158). As we can partition the variation into pure 
error and treatment variation, we can also calculate the percentage of treatment varia- 
tion accounted for by comparing the remainder mean square (157 with 21 df) with the 
treatment mean square (291 with 24 df) as 

Remainder MS ^ 3295/21 n 

1 — \ — — — — — — ■ = 0.460 . 

Treatment MS (573 + 5 + 3102 + 3295)/24 



TABLE 17.8 

Sequential ANOVA Table for Cotton Root Biomass Model in Terms of Variates H (Herbicide Dose), 
/ (Insecticide Dose) and the Combined Term H.l (Example 17.3A) 



Change 


Incremental df 


Incremental SS 


Mean Square 


Variance Ratio 


P 


+ H 


1 


573 


573 


F« = 3.29 


0.074 


+ 1 


1 


5 


5 


P = 0.03 


0.866 


+ H.I 


1 


3102 


3102 


II 

bo 


< 0.001 


Remainder 


21 


3295 


157 


pRem = 0.90 


0.589 


Residual 


75 


13,050 


174 






Total 


99 


20,025 









Note: SS = sum of squares. 



466 



Statistical Methods in Biology 



TABLE 17.9 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed Significance Levels (P) 
for Cotton Root Biomass Model in Terms of Variates H (Herbicide Dose), / (Insecticide Dose) and 
Their Interaction H.l (Example 17.3A) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


99.350 


7.915 


12.553 


< 0.001 


H 


Pi 


-29.050 


6.4622 


-4.495 


< 0.001 


1 


P2 


-0.573 


0.1616 


-3.545 


< 0.001 


H.l 


P3 


0.557 


0.1319 


4.223 


< 0.001 



This model therefore accounts for 46.0% of the treatment variation. We then con- 
sider the individual model terms. Because of the balanced allocation of treatments, 
variates H and / are orthogonal, and so we get the same incremental SS and tests for 
these terms fitted in either order. But we first examine the combined term, H.l, to see 
if we can simplify the model and find that the variance ratio = 17.830, P < 0.001) 
is highly significant and so we cannot. The predictive model therefore uses both 
variates and their combined term, and the estimated parameters are listed in Table 
17.9. The remainder sum of squares gives no evidence of lack of fit (ES^^ - 0.902, 
P = 0.589). 

In the form of Equation 17.4, with biomass as a function of herbicide dose for a given 
value of insecticide dose, the predictive model can be written as 

y{H,I) = (99.35 - 0.5731) + (-29.05 + 0.557I)H , 

where for brevity now H indicates the herbicide dose and 1 indicates insecticide dose 
applied. As the insecticide dose increases, the slope for herbicide increases and the 
intercept decreases. The fitted model is plotted with the observations in this form in 
Eigure 17.9, and it can be seen that the slope is negative for small doses of insecticide and 
positive for larger doses. 

Although a straight line seems a reasonable approximation to the shape for each 
fungicide dose, it is apparent that the fitted lines are not giving the best possible fit to 
the observations: the slope is clearly too gentle for the zero dose and too steep for the 
largest dose. Figure 17.10 shows contour plots for the observations and for the fitted 
model, which allows a visual comparison of the observed and fitted surfaces. Although 
there is some similarity across the two surfaces in terms of general trends, it is clear 
that the fitted model does not reproduce the observed trends well. This contradicts the 
non-significant test for lack of fit (based on which indicates that the discrepancy 
between the fitted values and treatment means is small compared with background 
variation. However, the presence of systematic (rather than random) discrepancies in 
the fitted model, as seen in Figure 17.9, suggests that some improvement in fit may be 
possible. 



An interaction between variates introduces one type of curvature into the fitted surface, 
buf more general forms will offen be required. The simplesf generalizafion is fo exfend 
fhe mefhodology used for polynomial models (Secfion 17.1.2) fo fwo dimensions. A model 
of order q fhen confains all combinafions of fhe explanatory variates wifh powers fhaf 
sum to<q. A firsf-order model for fwo explanatory variafes confains bofh individual 
variafes (order 1) buf nof fheir inferacfion, which is of order 2; fhis model is a sfandard 
MLR (Equafion 17.2). A second-order model adds fhe combined ferm and fhe squares of 
bofh variafes. For convenience, we label fhe coefficienfs in fhese models by the powers of 
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FIGURE 17.9 

Observed cotton root dry biomass (g/plant) with predictive model in terms of herbicide dose (Ib/acre), insecti- 
cide dose (a) 0. (b) 20, (c) 40, (d) 60, (e) 80 Ib/acre, and their interaction (Example 17.3A). 




FIGURE 17.10 

(a) Observed cotton root dry biomass (g/plant) and (b) predictive model for cotton root biomass in terms of 
herbicide dose (Ib/acre), insecticide dose (Ib/acre) and their interaction (Example 17.3A). 
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the explanatory variates in each term; hence, is the slope associated with the product 
Xj; X X 2 i- The second-order model can be written as 



yt - a + PioXi; -I- PoiX2f -i- p2o^i; + Pii(^ii x m) + Po2^2i + Si ■ (17-5) 

Similarly, a third-order model takes the form 

y,=a + PioXii -I- P 01 X 2 , -t p 2 o^?f + Pn {xii x X 2 ;) - 1 - Po 2 ^i, 

+ p30^1i + P 2 l(^li X X 21 ) + Pl 2 (^li X xli) + Po3^2i' + • 



Again, the components Xn x X 2 , , Xi, x X 2 , and Xi, x xf, represent combinations of powers of 
fhe two explanatory variates, calculated by multiplying the appropriate powers together. 

The potential problems associated with polynomial models for a single explanatory 
variate, namely, collinearity between terms and over-fitting, may also be encountered with 
these models for several variates. The same solutions also apply, so collinearity can be 
reduced by the use of centered or orthogonal polynomials in place of simple powers, and 
fhe full fitted curve or surface should be plotted on a dense grid of values to check for any 
undesirable feafures. The previous model-building strategy can also be extended to two 
explanatory variates, so we start with a low-order model and use visual checks to see if 
fhe model fit is adequate. If replication is present then we can make a formal test for lack 
of fit. If fhe model is not adequate, then a set of higher-order terms can be added. Once you 
have found a suitable order, you should check whether the model can be simplified by test- 
ing the highest-order terms with marginal F-tests. The least significant term is dropped 
first, and then other terms retested. This process must respect marginality, so that if a 
term is retained in the model then all terms marginal to it should also be retained (which 
makes the model invariant to changes of scale). At each stage, a term is eligible for test- 
ing if it is not marginal to (i.e. a sub-term of) any ofher term still in the model (see Section 
8.3.1). A sub-term is one that has all of its components in common with the term, so, for 
example, Xi,,x?,,^ 2 i and Xj, x X 2 , are all sub-terms of xj x X 2 ,. This process is demonstrated 
in Example 17.3B. In the predictive model, all terms eligible for testing should have statisti- 
cally significant marginal F-tests. 



EXAMPLE 17.3B: COTTON RESPONSE TO HERBICIDE AND INSECTICIDE 

In Example 17.3A, we detected systematic discrepancies between the observed biomass 
and the predictive model with individual variates H (herbicide dose), / (insecticide dose) 
and their interaction, H.l. Here, we consider higher-order models to see if a better fit can 
be obtained. To avoid problems with collinearity, we centre each variate before calcu- 
lating powers. The centered variates are calculated as cH = H -1 and cl = l - 40, using 
variates defined in Example 17.3A. 

For a second-order model, we need the main variates cH, cl, and the products 
cH.cl = cH X cl, (cHY = cH x cH and (clY = cl x cl. The second-order model, from Equation 
17.5, can then be written in symbolic form as 

Response variable: Biomass 

Explanatory component: [1] + cH + cl + (cHY + cH.cl + {clY 

This model accounts for 17.5% of the variation in the data (and 52.3% of the variation in 
the treatment means). Examination of the fitted curves and surface (as in Figures 17.9 
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and 17.10) shows that although this fitted model is closer to the observations, there are 
still clear systematic discrepancies visible. We therefore try a third-order model, as pre- 
sented in Equation 17.6. This model requires four addifional variates to be calculated, 
i.e. the cubes of fhe original variates and the products of fhe squared and linear terms, 
and can be written in symbolic form as 

Explanatory component: [1] + cH + cl + (cHf + cH.cl + (ciy + (cHY + {cHf.cl 

+ + {ciy 

Because the design is orthogonal, and we use the estimate of pure error as our residual 
mean square, we gef a single sequential ANOVA table for this model shown in Table 
17.10. 

In the full model, we first test the cube terms and both the herbicide dose {cHf 
= 0.03, P = 0.871) and insecticide dose {clf (F'^ = 0.94, P = 0.336) terms can be removed 
from the model. Because the terms are orthogonal, we can immediately examine the 
third-order cross-product terms from the same table, and drop the product of the square 
of herbicide dose with insecticide dose {cHf.cl (F^^' = 1.36, P = 0.247). The square of her- 
bicide dose (cHf is then eligible for testing, and this term can also be removed from the 
model (F^^ = 0.63, P = 0.432). No further terms can be removed. Parameter estimates in 
the final model are lisfed in Table 17.11, giving the predictive model as 

= 75.29 + 4.84(H - 1) - 0.016(7 - 40) + 0.557(77 - 1)(7 - 40) 

- 0.0070(7 - 40)^ - 0.01452(77 - 1)(7 - 40)^ . 

Because we have constructed the model in terms of centered variates, these centered 
variates must appear in the predictive model. 

If we expand each term in full and gather together the coefficients for each combina- 
tion of variables, the predictive model can be rewritten as 

|1(77,7) = 105.34 - 40.6677 - 1.177 + 1.718777 + 0.0077^ - 0.0145777^ 

= 105.34 - 1.177 + 0.007 f + (-40.66 + 1.7187 - 0.01457^)77 . 



TABLE 17.10 

Sequential ANOVA Table for Third-Order Polynomial Model for Cotton Root Biomass Models in 
Terms of Centered Variates c/7 (Herbicide Dose) and cl (Insecticide Dose) (Example 17.3B) 



Change 


Incremental df 


Incremental SS 


Mean Square 


Variance Ratio 


P 


+ cH 


1 


573 


573 


F« = 3.29 


0.074 


+ cl 


1 


5 


5 


F' = 0.03 


0.866 


+ {cHf 


1 


109 


109 


F”2 = 0.63 


0.432 


+ cH.cl 


1 


3102 


3102 


Ff^' = 17.83 


< 0.001 


+ (c/)2 


1 


553 


553 


F'2 = 3.18 


0.079 


+ (c/7)3 


1 


5 


5 


F“ = 0.03 


0.871 


+ (cHf.cl 


1 


237 


237 


F»2-' = i.36 


0.247 


+ cH.(clf 


1 


1180 


1180 


F«'2 = 6.78 


0.011 


+ {clf 


1 


163 


163 


F'3 = 0.94 


0.336 


Remainder 


15 


1049 


70 


pRem = 0.40 


0.975 


Residual 


75 


13,050 


174 






Total 


99 


20,025 









Note: SS = sum of squares. 
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TABLE 17.11 



Parameter Estimates with Standard Errors (SE), t-Statistics (t) and Observed Significance Levels 
(P) for Cotton Root Biomass Predictive Model in Terms of Centered Variates cH ( = /-/-!) and cl 
( = / - 40) (Example 17.3B) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


a 


75.29 


4.111 


18.313 


< 0.001 


cH 


Pio 


4.844 


5.814 


0.833 


0.407 


cl 


Poi 


-0.016 


0.0933 


-0.169 


0.866 


cH.cl 


Pn 


0.557 


0.1319 


4.223 


< 0.001 


(c/)2 


Po2 


-0.0070 


0.00394 


-1.783 


0.079 


cH.(c/)2 


Pl2 


-0.01452 


0.005574 


-2.604 


0.011 



In terms of herbicide (H), the predictive model is still a straight line for a given value 
of insecticide, but both the intercept and the slope vary as a quadratic function of insec- 
ticide dose. These straight lines are shown in Eigure 17.11, and clearly provide a much 
better fit to the data than those from the simpler model shown in Eigure 17.9. The shape 
of the fitted surface, shown as a contour plot in Eigure 17.12, also appears a more reason- 
able fit to the observed surface. This model accounts for 23.2% of the total variation, and 
71.7% of the treatment variation. 

Given that we detected no formal evidence of lack of fit in our original model (linear 
plus interaction. Example 17.3A), we should check that our final model gives a quan- 
tifiable improvement in fit. For these two nested models, we can construct an F-test 
based on the change in the model sum of squares and df on adding the extra terms 
into the model (see end of Section 14.4). For the linear plus interaction model, we found 
ModSSj = 3680 with ModDFj = 3, compared to ModSSj = 5414 with ModDF 2 = 5 for our 
final predictive model. We compare this change to our estimate of background varia- 
tion, i.e. ResMS = 174 on 75 df. The F-statistic is calculated as 

(ModSSj - ModSSi)/(ModDF 2 - ModDEj) _ (5414 - 3680)/(5 - 3) _ 1734/2 _ ^ 

ResMS “ 174 “ 174 “ ' ' 

with 2 and 75 df, giving P = 0.009. There is thus strong evidence that the final model 
gives a better fit compared to the simpler model. This example demonstrates that the 
lack-of-fit test sometimes lacks power; it might be possible to improve a model even 
when the formal test for lack of fit is not statistically significant. 



Example 17.3 was a designed experiment, and so a balanced set of combinations of the 
two variates had been used, which made the variates orthogonal and which greatly simpli- 
fied the process of model selection. This is much less likely to occur in observational data, 
especially where variates are correlated. In general, many more observations are required 
to get a good spread of observations across two explanatory variates than for one, a situa- 
tion known as the curse of dimensionality. Good coverage of the two-dimensional space 
spanned by two explanatory variates is essential if the model is to be robust across the 
full space, and you can check this by plotting the explanatory variates against each other. 
Predictions for regions with few observations should be treated as extrapolation and can 
be unreliable. 

The models presented in this section, sometimes called response surface models, can be 
extended to three or more explanatory variates, but verification of the form of the model 
becomes much harder, because a full visual representation of the model requires four or 
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(c) 



(d) 





FIGURE 17.11 

Observed cotton root dry biomass (g/plant) with predictive model in terms of herbicide dose (Ib/acre) and 
insecticide dose (a) 0, (b) 20, (c) 40, (d) 60, (e) 80 Ib/acre (Example 17.3B). 




FIGURE 17.12 

(a) Observed cotton root dry biomass (g/plant) and (b) predictive model for cotton root biomass in terms of 
herbicide dose (Ib/acre) and insecticide dose (Ib/acre) (Example 17.3B). 
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more dimensions. The curse of dimensionality also means that coverage is likely to be 
inadequate unless the observations come from a designed experiment. 



17.3 Fitting Models Including Non-Linear Parameters 

All of the models considered so far have been within the class of linear models, which 
means that the model can be written as a set of terms added together, each of which consists 
of an unknown coefficient (a model parameter) multiplied by a known value (an explana- 
tory variable); for example, see Equation 17.2. As we have seen earlier in this chapter, these 
models can be used to fit curved as well as straight line relationships. However, these strate- 
gies provide only a limited range of models, and in some cases, a good fit cannot be found 
with this approach. The set of possible models can be widened by introducing non-linear 
models, i.e. models that cannot be written in linear form. An advantage of these models 
is that for some types of response, where there is a good understanding of the underlying 
process, a non-linear model with biologically meaningful parameters may be constructed. 

One simple example of a non-linear model takes the form 



where a, (3 and 0 are parameters to be estimated. If 9 was fixed (e.g. 0 = 2), so the quantity 
X,® was known, then this would be a linear model; it is the presence of 0 as an unknown 
parameter that makes this model non-linear. Non-linear models can include several 
explanatory variates, although we consider only the case of a single explanatory variate 
here. In general, they may include several non-linear parameters, and so the number of 
parameters will not necessarily be one greater than the number of explanatory variates. 
For now, we label the set of p parameters in a non-linear model as ... Yp- Any non-linear 
model with a single explanatory variate can be written in general terms as 



where y, is the fth observation with value x, of the explanatory variate and deviation e„ and 
/(x, Yi ... Yp) gives the form of the non-linear function. For the example given above, we have 



where Yi = oc, Y2 = P and Y3 = 0- Our symbolic notation does not adapt easily to this frame- 
work, and so we do not use it here. 

As for the linear models of earlier sections, we obtain least squares estimates for the 
parameters, but the process must be modified to estimate the non-linear parameters. The 
least squares estimates minimize the residual sum of squares (see Section 1.5), which for 
non-linear regression takes the form 



1 /i = a + (3x;® + e, , 



Vi = f{x„yi...Jp) + e, , 



f{x„y 1,72,73) = yi + yixp , 



N 



N 



ResSS = min^ (y, - y.)^ 



mm 
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Unlike the corresponding equations for linear models, these equations cannot be solved 
directly and an iterative algorithm is required to search numerically for fhe leasf squares 
esfimafes (see Seber and Wild, 1989, for furfher informafion). This algorifhm mighf be 
unsfable if if sfarfs far from fhe solufion: if mighf fail fo converge, or if mighf appear fo 
converge buf af a poinf fhaf does nof give a global minimum of fhe ResSS (known as a local 
minimum). For fhis reason, you should provide good inifial values for paramefers when- 
ever possible, for example, from previous relafed work or prior knowledge. Alfernafively, 
several differenf sefs of inifial paramefer values, for example, covering a regular grid, can 
be fried and fhe fiffed model wifh fhe smallesf ResSS is selecfed. Always plof fhe model 
wifh fhe observafions fo ensure fhaf fhe fif is adequafe. The assumpfions presenfed in 
Secfions 4.1 and 12.1 also apply fo non-linear models, and residuals should be examined 
fo check for model misspecificafion and fhe validify of fhe assumpfions wifh fhe graphical 
diagnosfic fools of Chapfers 5 and 13. 

Some of fhe mosf common non-linear curves are fhe exponenfial, logisfic, Gomperfz and 
inverse linear models and we discuss fhese briefly here, wifh some fypical curve shapes 
shown in Figure 17.13. 

The sfandard exponential model has the form 



y, = a + P exp(-yx;) -i- e, . (17.7) 

Inferprefafion of individual paramefers is nof sfraighfforward, buf we mighf fhink of a as 
seffing fhe level of fhe curve (analogous fo an infercepf), P as confrolling fhe scaling (or 
ef fecfive range) of fhe curve and y as confrolling fhe curvaf ure. The direcfion of fhis curve 
(and fhose considered below) varies according fo fhe signs of fhe paramefers P and y. The 



(a) (b) 





(c) 



(d) 




0 



a + p = 15 



8=2 3 4 



a + p/e 
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0 



8 = 2 3 4 




FIGURE 17.13 

(a) Exponential curve with a = 5, p = 10, y = 2; (b) inverse linear curve with a = 5, p = 10, y = 2; (c) logistic curve 
with a = 5, p = 10, y = 2, 8 = 2; (d) Gompertz curve with a = 5, P = 10, y = 2, 8 = 2. Observations generated as func- 
tion plus Normal deviations with common variance. 
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form for (3 > 0 and y > 0 is shown in Figure 17.13a. Here, fhe value of fhe curve decreases 
as fhe explanatory variate increases, if crosses fhe y-axis af infercepf value a + [3, and 
decreases towards a lower asympfofe af value a. The rafe of change decreases as fhe value 
of fhe explanatory variafe increases. This curve is said fo have a lower asympfofe (limif) fo 
fhe righf (as fhe explanatory variafe increases). An alfernafive fo fhe exponenfial model is 
given by fhe inverse linear model, wifh form 

P 

^ 1 + yXi 



For P > 0 and y > 0 (Figure 17.13b), fhis curve also has a decreasing form wifh an infercepf 
of a + P and a lower righf asympfofe of a, buf is less sharply curved fhan fhe exponen- 
fial model and approaches fhe asympfofe more slowly. Varying fhe signs of parameters 
P and y gives curves of differenf shapes. The exponenfial and inverse linear models bofh 
give decreasing funcfions wifh a lower righf asympfofe when P > 0 and y > 0; increasing 
funcfions wifh a lower leff asympfofe wifh P > 0 and y < 0; increasing funcfions wifh an 
upper righf asympfofe wifh P < 0 and y > 0; and decreasing funcfions wifh an upper leff 
asympfofe when P < 0 and y < 0. These models are offen used for modelling growfh curves 
or decay funcfions. 

For S-shaped growfh curves, we consider fhe logisfic and Gomperfz models. The logis- 
tic model fakes fhe form 



yi = a + 



P 

1 -I- exp[-y(x, - 5)] 



-I- e, . 



Again, we mighf fhink of a as seffing fhe level of fhe curve, P as confrolling fhe scale (or 
ef fecfive range) of fhe curve, y confrolling fhe curvaf ure and fhe new paramefer 5 as defin- 
ing fhe posifioning of the curve with respect to values of fhe explanatory variafe. A logisfic 
model is shown in Figure 17.13c in fhe form wifh P > 0 and y > 0. This curve has a lower leff 
asympfofe af value a as fhe explanatory variafe decreases and an upper righf asympfofe 
af value a -i- P as fhe explanatory variafe increases. The curvafure is symmefric abouf 5, 
which is the value of fhe explanatory variafe af which fhe slope of the curve is steepest 
(known as the inflexion poinf). Af fhis poinf, fhe curve is af fhe midway poinf befween fhe 
fwo asympfofes, and fakes fhe value a -i- p/2. 

Anofher S-shaped curve is fhe Gompertz model, which is wriffen as 

y, = a + P exp{-exp[-y(x; - 5)]} + e, , 



and is shown in Figure 17.13d wifh P > 0 and y > 0. This curve also has a lower leff asymp- 
fofe af value a and an upper righf asympfofe af a -i- P, buf is asymmefric abouf fhe value 
X = 5. Again, fhe direcfion and shape of fhe logisfic and Gomperfz models can be manipu- 
lated by changing fhe signs and values of fhe paramefers P and y. 



EXAMPLE 17.1C: OLSEN P 

In this example, we compare the fit of the exponential and inverse linear models to 
those fitted in Examples 17.1A and B. We first consider the exponential model, which 
accounts for 83.1% of the variation (adjusted M = 0.831) and gives fitted model 
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(i{OlsenP) = 4.405 - 4.412 x exp(-0.359 x OlsenP') . 

This model has P < 0 and y > 0 and so is an increasing curve with an upper asymptote, 
as required by the shape of the response. The upper asymptote is at a = 4.405, but the 
intercept with the y-axis is not of interest as it is well outside of the range of the Olsen P 
measurements. The fitted curve is shown with 95% CIs in Figure 17.14a and clearly fol- 
lows the pattern of response well. 

In comparison, the inverse linear model accounted for 83.5% of the variation (adjusted 
= 0.835) and gives the fitted model 



piOlsenP) = 4.962 . 

1 + (1.208 X OlsenP) 

Within the range of the observations, this model has a very similar shape to that of the 
exponential model (Figure 17.14b). It has a slightly higher asymptote (d = 4.962) but 
decreases much more sharply below the smallest Olsen P measurement. The 95% CIs for 
these two non-linear models have quite different shapes, reflecting different sources of 
uncertainty in the two models. The inverse linear model shows much more uncertainty 
around the point of maximum curvature, whereas the exponential model shows more 
uncertainty moving towards the upper asymptote. 

Table 17.12 summarizes the goodness-of-fit statistics for the models fitted in all 
parts of Example 17.1. Based on the adjusted and AIC statistics, the non-linear mod- 
els fit better than all but the eighth-order polynomial model, which we previously 
dismissed on the grounds of over-fitting. The SBC, which penalizes the number of 
parameters more heavily, shows a clear preference for the non-linear models. There 
is little statistical difference in the fit of the two non-linear models, so either might 
reasonably be selected. 



There are many variations and extensions of these models available in statistical soft- 
ware, as well as other types of non-linear models. In addition, most software allows user- 
defined non-linear functions to be fitted. Successful estimation of parameters in non-linear 
models depends on the amount of information available from the observations, as well 
as good initial values for fhe parameters. For example, the logistic or Gompertz curves 





FIGURE 17.14 

(a) Fitted exponential model ( — ) with 95% Cl ( — ) and (b) fitted inverse linear model ( — ) with 95% Cl ( — ) for 
yield in terms of Olsen P measurements (Example 17.1B). 
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TABLE 17.12 



Summary Statistics for Models for Yield as a Function of Olsen P Measurements (Example 17.1A, 
17.1B or 171C) 



Example 


Model 


Number of 
Parameters 


ResMS 


Radj (xlOO) 


AIC 


SBC 


17.1A 


SLR with logio(01sen P) 


2 


0.1191 


79.7 


19.2 


21.2 


17.1B 


Quadratic polynomial 


3 


0.1126 


80.8 


19.0 


22.0 


17.1B 


Cubic polynomial 


4 


0.1114 


81.0 


19.6 


23.5 


17.1B 


Eighth-order polynomial 


9 


0.0799 


86.3 


15.4 


24.4 


17.1C 


Exponential model 


3 


0.0991 


83.1 


16.4 


19.4 


17.1C 


Inverse linear model 


3 


0.0965 


83.5 


15.9 


18.9 



require some observations in the upper part of the curve to get a good estimate of fhe 
upper asymptofe. Similarly, fhe precision of the curvature and position parameters (y and 
5) for these models becomes greater as the number of observations between the asymp- 
totes increases. In general, it is desirable to have observations spread across the full range 
of the curve, and particularly at points where the slope of the curve changes. The param- 
eterization used within non-linear models can influence the stability of fhe estimafion 
procedure, and sfafistical packages use various differenf parameferizations. For example, 
fhe exponenfial model defined in Equafion 17.7 can be wriffen in an alternafive form as 

y, = a + pcp^‘ + e , , 

where y = -log^tp and fhe ofher paramefers refain fheir original inferprefafion. If can be 
helpful fo fry differenf parameferizafions when problems wifh convergence are encoun- 
fered, or fo obfain paramefers fhaf have a biological inferprefafion. 

Inference for non-linear models is nof as sfraighfforward as for linear models; in par- 
ficular, SEs for paramefer esfimafes and predicfions are approximafe, so fhat differenf sfa- 
fisfical soffware can give somewhaf differenf resulfs. In mosf cases, approximafe SEs and 
f-fesfs are reporfed in addifion fo ANOVA fables wifh approximafe E-fesfs. Obfaining SEs 
for non-linear funcfions of paramefers, for example, for y = -log^cp, mighf also be neces- 
sary and is usually achieved by fhe delfa mefhod, sometimes called linearization (see e.g. 
Casella and Berger, 2002 or Seber and Wild, 1989). 

Non-linear models can easily be extended to cases where different groups are present, 
and the approaches are analogous to those described in Chapter 15. The most general 
models allow all parameters to be separate among groups (e.g. a different asymptote for 
each group) and fhe mosf resfricfive insisf on common paramefers across groups, wifh 
many infermediafe models fo be invesfigafed. 

EXERCISES 

17.1* An experimenf was done fo esfablish fhe effecfiveness of low doses of a fun- 
gicidal compound. Six fracfions of fhe sfandard dose (1, 1/2, 1/4, 1/8, 1/16 and 
1/32) were applied fo individual leaves infecfed wifh a pafhogen, and fhe num- 
ber of colonies on each leaf were counfed affer a given period. Leaves wifhouf 
fungicide applied were included as a negafive confrol. The design was a RCBD 
wifh fhree replicafes, giving 21 leaves in fofal. Eile colonies.dat holds fhe unif 
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numbers {ID), structural factors (Rep, Leaf), the dose applied (variate Dose) and 
the number of colonies observed (variafe NColonies). 

a. Find a fransformafion of fhe dose variable fhaf gives an approximafe linear 
relafionship wifh fhe number of colonies. Fif a SLR using fhis fransformed 
variafe, including fhe design sfrucfure in your model. Check for lack of fif 
and sfafe fhe predicfive model. 

b. Find a non-linear model for fhe number of colonies fhaf accounfs for fhe 
sfrucfure of fhe design. Check for lack of fif. Compare fhis non-linear model 
wifh fhe fransformafion used in parf (a) and sfafe which model you prefer, 
wifh reasons. 

17.2 The microarray sfudy described in full in Exercise 12.6 invesfigafed gene 
expression associafed wifh senescence of leaves. File senescence.dat holds 
design informafion {ID, variafe Day, factor BiolRep) and fhe expression value for 
fhree genes (variafes CATMA3A13560, CATMA2A31585 and CATMA1A09000) 
from each planf following normalizafion.* 

a. Can you reasonably use polynomial regression to predicf fhe expression of 
genes CATMA2A31585 or CATMA1A09000 over fime? Over whaf range are 
your predicfions reliable? 

b. Can you improve on fhese predicfions by using non-linear models? 

17.3 Exercise 13.3 analysed a sef of chickweed planfs from a field frial to invesfigafe 
whefher fhe number of seeds produced could be related to fhe planf biomass, 
measured as dry weighf (g). There was evidence of variance heferogeneify, buf 
fhe log-fransformafion required fo sfabilize fhe variance gave a curved rela- 
fionship. Here, we fry fo find a model for fhaf curved relafionship, buf now 
also include similar samples from several differenf experimenfs, carried ouf 
in differenf years and in differenf crops. Eile cwtrials.dat holds unif num- 
bers {ID) wifh a code for each frial (Trial), fhe year (factor Year) and crop type 
(factor Crop) as well as the number of seeds (variafe NSeed) and dry weighfs 
(variafe DryWt) for 193 planfs. Eind a fransformafion of fhe explanatory vari- 
afe {DryWt) fhaf linearizes fhe relafionship wifh logfNSeed) and use regres- 
sion wifh groups fo esfablish whefher fhe relafionship differs befween crops or 
years, or bofh. Idenfify a predicfive model for fhe log-fransformed number of 
seeds. Wrife down and inferpref fhis predicfive model. (We re-visif fhese dafa 
in Exercise 18.7.) 

17.4 In Exercise 9.1, dafa from a field frial fo invesfigafe fhe effecf of sulphur ferfil- 
izer on fhe yield of spring barley were analysed using ANOVA (dafa in file sul- 
phur.dat). Now use polynomial regression fo model grain yield as a funcfion of 
applied sulphur, accounfing for fhe sfrucfure of fhe experimenfal design. Check 
for lack of fif. Wrife down fhe predicfive model and give a 95% Cl for grain yield 
wifh 25 kg S applied. 

17.5 A microarray sfudy was done fo invesfigafe fhe genes associafed wifh infecfion 
of leaves by fungal pafhogens. Ninefy-six planfs were grown in a conf rolled 
environmenf and fhe sevenfh leaf of each planf was excised af fime zero and 
a mock inoculafion was carried ouf (fo give a baseline measuremenf). There 
were 24 sample poinfs af 2-h intervals sfarfing 2 h after fhe mock inoculafion. 



Data from V. Buchanan-Wollaston (PRESTA), University of Warwick. 
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i.e. at 2, 4, 6 ... 48 h, and randomization was used to allocate four leaves to each 
sample point. Gene expression at the designated time was measured for each 
leaf. File botrytis.dat holds unif numbers (ID) and sfrucfural facfors (Hour, 
Leaf) wifh fhe expression values for one gene (variafe CATMA1A00045). Plof 
fhe dafa. Whaf do you nofice abouf fhe paffern over fime? Can you model fhis 
pattern using frigonomefric regression? Is fhere any evidence of lack of fif fo 
fhis relafionship?* * 

17.6 An experimenf was done fo measure fhe response of yield fo dose of nifrogen 
ferfilizer. The design was a RCBD wifh four blocks of five freafmenfs, corre- 
sponding fo 0, 50, 100, 150 and 200 kg/ha of nifrogen applied and fhe response 
is plof yield. File fertilizer.dat confains fhe unif numbers (ID), sfrucfural fac- 
fors (Block, Plot), fhe amounf of nifrogen applied (variafe N) and fhe plof yields 
(variafe Yield). Find a non-linear model fo describe fhe response of yield fo 
applied nifrogen. Check your model for misspecificafion and lack of fif. Wrife 
down and inferpref fhe predicfive model. 

17.7 The yield response of Brussels sprouf fo applied nifrogen was invesfigafed 
using a RCBD wifh fhree blocks of 13 plofs. The freafmenfs were 11 doses of 
nifrogen, befween 0 and 250 kg/ha, wifh fwo replicafes of 150 and 200 kg/ha 
per block. File sprouts.dat confains fhe unif numbers (ID), sfrucfural facfors 
(Rep, Plot), applied nitrogen (variate Nitrogen) and yield converted to tonnes 
per hectare (variate Yieid). Plot the data and establish a predictive model to 
describe the pattern of response.^ 

17.8 Exercise 15.10 developed a model for oxygen consumption of wireworm larvae 
in terms of bodyweight and temperature groups. Now form a variate version of 
the temperature factor and investigate whether a surface can be developed in 
terms of temperature and bodyweight. Check your final predictive model for 
lack of fit and produce a visual representation of its fit. Write down and interpret 
your predictive model and comment on its usefulness. 

17.9 Exercise 11.2 fitted a model for linseed yield in terms of barley and chickweed 
densities as factors. Eit a surface model for linseed yield using the two explana- 
tory variables as variates. Use visual checks for model misspecificafion as well 
as formal tests for lack of fit. Write down your predictive model and state the 
range of values over which it can be considered reliable. 

17.10 Exercise 15.3 established a separate lines model for splash heights in terms of 
velocity and weight classes. Now fit a surface in terms of the two explanatory 
variates. Eirst, extract the estimates of intercept and slope for each weight class 
from the separate lines model and plot these against the weight values. Eind a 
transformation of the weight variate that makes these two patterns into approx- 
imately straight lines - this is the transformation of weight to use in the surface 
model. Identify a parsimonious predictive model for your surface in terms of 
velocity and the transformed weight variate. Write down an equation for your 
predictive model and visualize it as a surface or contour plot. Does this give a 
better or worse fit than a surface constructed from the untransformed weight 
variate? 



Data from K. Denby (PRESTA), University of Warwick. 

* Data from Horticulture Research International. 
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Models for Non-Normal Responses: Generalized 
Linear Models 



We have seen in the previous chapters that linear models relate a response variable to one 
or more explanatory variables (factors or variates) and that inferences from fhese models 
rely on assumpfions abouf fhe disfribufion of fhe response, offen expressed in ferms of 
properfies of fhe model deviafions (presenfed in Secfions 4.1 and 12.1). Two of fhe mosf 
imporfanf assumpfions are fhaf fhe deviafions, and hence fhe observed responses, have 
a common variance and follow a Normal disfribufion. These properfies are required fo 
make fhe F- and f-disfribufions valid for sfafisfical inferences such as hypofhesis fesfing 
and calculafion of confidence infervals (CIs). In Chapfer 5, we presenfed a sef of diagnosfic 
fools fhaf can be used fo check fhose assumpfions. In Chapfer 6, we fhen suggesfed frans- 
formafion of fhe response variable fo correcf for heferogeneify of fhe variance and fo make 
fhe disfribufion of fhe deviafions approximafe fo a Normal disfribufion. However, for some 
fypes of response we expecf, in advance of any sfafisfical analysis, fhaf fheir disfribufions 
will nof be Normal, fhaf fheir variances will be heferogeneous and fhaf fransformafion 
mighf be unsafisfacfory. Moreover, we can somefimes explicifly wrife fhe form of fhese 
disfribufions from knowledge of fhe underlying process(es) fhaf generafed fhe responses. 
Specifically, here we consider proporfions fhaf have been calculafed from discrefe counfs 
(e.g. number of planfs ouf of 20 affecfed by a disease) which are likely fo have a Binomial 
disfribufion (Secfion 2.2.1), and responses fhaf are generafed as discrefe counfs (e.g. num- 
ber of beefles caughf in a piffall frap during 24 h) which are likely fo have a Poisson disfri- 
bufion (defined in Secfion 18.3). Responses wifh fhese probabilify disfribufions, and some 
ofhers fo be discussed lafer, can be analysed wifh generalized linear models (GLMs). This 
broad class of models allows fhe response fo arise from one of several differenf probabilify 
disfribufions, exfending fhe mefhods fo sifuafions ofher fhan fhe Normal disfribufion; 
however, fhis addifional flexibilify means fhaf somewhaf more complex esfimafion and 
inferenfial fechniques are required. 

In fhis chapfer, we briefly infroduce GLMs for Binomial and Poisson responses and 
describe fhe underlying models. We sfarf wifh a general overview of fhe GLM model 
(Secfion 18.1). We consider a GLM for proporfions wifh a Binomial disfribufion (Secfion 
18.2), including some discussion abouf fhe defecfion and handling of over-dispersion 
(Secfion 18.2.2), checking model assumpfions (Secfion 18.2.3) and fhe special case of 
binary responses (Secfion 18.2.7). We fhen infroduce GLMs for discrefe counfs wifh a 
Poisson disfribufion (Secfion 18.3), and we end by describing briefly some ofher sifuafions 
in which GLMs are used and some furfher exfensions fo fhese models (Secfion 18.4). 
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18.1 Introduction to Generalized Linear Models 

A generalized linear model (GLM) extends the linear model framework to the situation 
where the responses have certain forms of non-Normal disfribufions, specifically disfri- 
bufions wifhin fhe exponenfial family (Dobson, 1990) such as fhe Binomial and Poisson 
disfribufions. A Binomial disfribufion fypically occurs where a fixed number of samples 
have been fesfed and fhe number passing (or failing) fhe fesf is counfed. For example, in a 
survey of wafer sources we mighf fake 20 separafe samples from each source and counf fhe 
number ouf of 20 wifh concenfrafions of mercury exceeding fhe maximum limif permif- 
fed for drinking wafer. A Poisson disfribufion fypically occurs where responses consisf 
of discrefe counfs. For example, we mighf counf fhe number of viable seeds produced by 
individual planfs fo compare producfivify across differenf variefies of a planf species. A 
GLM direcfly accounfs for fhe parficular characferisfics of fhe disfribufion associafed wifh 
a response and uses fhese characferisfics in paramefer esfimafion and inference. However, 
because we are now dealing wifh non-Normal disfribufions, we musf modify fhe form of 
our models. Recall fhaf in Secfion 1.3 we wrofe our sfafisfical model in fhe form 

response = sysfemafic componenf + random componenf , 

where fhe response was a numerical oufcome, fhe sysfemafic componenf was a mafhemafi- 
cal f uncf ion of one or more explanafory variables (facfors, variafes or bofh) and fhe random 
componenf (or model deviafions) accounfed for variafion in fhe response nof explained by 
fhe sysfemafic componenf. Unforfunafely, fhis parfifioning of fhe model is specific fo fhe 
Normal disfribufion and does nof apply in a sfraighfforward manner fo non-Normal dis- 
fribufions. In general, if is more convenienf fo sfafe fhe disfribufion of fhe response and fo 
wrife fhe model in a differenf form as 

E(response) = sysfemafic component , 

i.e. the expected value of fhe response is equal fo fhe sysfemafic componenf of our model. 
In mathemafical ferms, we often wrife fhis as E(y,) = p„ where p, is fhe expecfed value of 
y„ fhe response for fhe ifh observafion. The sysfemafic componenf of fhe model is sfill a 
mafhemafical funcfion of fhe explanafory variables, buf now we allow a more complex 
form fhaf involves a fransformafion. This fransformafion is used fo accounf for boundar- 
ies on fhe range of possible values of fhe response variable, which should fherefore also 
apply fo fhe expecfed value. Eor example, for Poisson responses, fhe expecfed value musf 
remain posifive, while for Binomial responses where m fesfs have been made on each indi- 
vidual, fhe expecfed value musf lie befween 0 and m. The sysfemafic componenf can fhen 
be expressed in general form as 

g(sysfemafic componenf) = linear funcfion of explanafory variables , 

where fhe funcfion g() is called fhe link function because it provides the link between the 
response and the explanatory variables. The linear function of the explanatory variables 
on the right-hand side of fhis equafion, which may comprise any combinafion of facfors 
and variafes, is known as fhe linear predictor. Various different link functions can be 
used, but each distribution has a canonical link which has good mathematical properties 
and often works well in practice. For Binomial and Poisson responses, the logit and the log 
are the canonical link functions, respectively. 
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Once a model has been defined, the model parameters can be estimated. For non-Nor- 
mal distributions, the simple method of least squares is no longer appropriate for param- 
eter estimation. Instead, the principle of maximum likelihood estimation is used. We do 
not go into mathematical details here, but Dobson (1990) or Collett (2002) provide a good 
description. One of the consequences of this change is that, instead of obtaining exact SEs 
for parameter estimates, the estimated SEs become approximate, as does the calculation of 
CIs and hypothesis tests. These issues will be discussed in the following sections, where 
we consider the cases of Binomial and Poisson responses in more detail. 



18.2 Analysis of Proportions Based on Counts: Binomial Responses 

The Binomial distribution, which was introduced in Section 2.2.1, usually arises as the dis- 
tribution of the number of successes out of a series of m independent binary tests (i.e. tests 
with only two possible outcomes: success or failure), where all tests have the same prob- 
ability of success. In the context of a GEM, we have N Binomial responses, each of which is 
the result of a number of binary tests. The ith response consists of two pieces of informa- 
tion: the number of tests, denoted m„ and the number of successes, denoted y,. Note that 
the number y, can take only integer values in the set 0, 1, 2, . . . for i = l ... N. If only one 
test is made on each unit, so that m, = 1, then we have binary observations that have only 
two possible values, zero or one. Many of the useful properties that apply to Binomial data 
fail in the case of binary data, and this is discussed in Section 18.2.7. 

EXAMPLE 18.1A: DEMETHYLATION EXPERIMENT 

This experiment is a pilot study intended to calibrate a scientific procedure. A demeth- 
ylation agent is applied to plants: the agent has the effect of converting methylated 
nucleotides to non-methylated form, causing epigenetic changes that lead to abnormal 
phenotypes such as stunting and deformation (Amoah et al., 2008). The pilot study 
aimed to investigate the relationship between dose and the resulting proportion of 
plants with a normal phenotype. Seed was treated with the demethylation agent at 
six doses, including a zero control dose. Plants were grown in trays, each tray sown 
with seeds treated with the same dose of agent and each dose was replicated in four 
trays: two with 60 plants, and two with 100 plants. The trays were arranged as a CRD 
(Chapter 4). Table 18.1 lists the number of plants with a normal phenotype in each tray 
(Normaly i = \ ... 24) with the number of plants per tray (Total,). The data can also be 



TABLE 18.1 

Number of Normal Plants (Total Number of Plants) per Tray for Doses of Demethylation Agent 
(Example 18.1A and File demethylation.dat) 

Dose 



0 


0.01 


0.1 


0.5 


1.0 


1.5 


59 ( 60 ) 


58 ( 60 ) 


54 ( 60 ) 


4 ( 60 ) 


3 ( 60 ) 


3 ( 60 ) 


58 ( 60 ) 


59 ( 60 ) 


53 ( 60 ) 


11 ( 60 ) 


2 ( 60 ) 


3 ( 60 ) 


99 ( 100 ) 


98 ( 100 ) 


88 ( 100 ) 


14 ( 100 ) 


2 ( 100 ) 


1 ( 100 ) 


98 ( 100 ) 


99 ( 100 ) 


87 ( 100 ) 


15 ( 100 ) 


1 ( 100 ) 


3 ( 100 ) 



Source: Data from S. Amoah, Rothamsted Research. 
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found in file demethylation.dat which contains explanatory variate Dose, response 
variate Normal and variate Total containing the number of plants for each tray, each 
identified using dummy index variate DTray (the original layout of trays was not 
recorded). Figure 18.1 shows the proportions of normal plants {Prop ^ = Normal J Total) 
plotted against the dose applied. We can think of the agent acting on each seed inde- 
pendently, with the probability of producing a normal phenotype dependent on the 
dose applied. We therefore expect the number of plants with a normal phenotype in 
each tray to have a Binomial distribution. 



Observations expected to follow a Binomial distribution with m, tests can be denoted 
as y, ~ Binomial(m„ p), where p, is the underlying probability of success in each test, with 
0 < p, < 1. The probability of observing a response y, for the fth observation can then be 
written as 



fyj I 

Prob(y,; m„p,) = — - — NPi^‘C^ ~ PiT‘ forj/i = 0 ... m,- . 

y,\{ni, - y,)! 

This probability depends on both the number of tests, which is a known value, and on 
the probability of success, p„ an unknown parameter. We hypothesize that the probability 
of success may depend on explanatory variables, for example, in Example 18.1A, that the 
probability of obtaining a normal plant depends on the dose of the demethylation agent 
applied. If y, follows a Binomial distribution, then its expected value and variance are, 
respectively 



E(y,) = b, = m,p, ; Var(y,-) = m,p,(l - p,) . 

The expected value is the product of the number of tests and the probability of success in 
each test. As the number of tests is fixed (once the data have been obtained), modelling the 
probability of success is equivalent to modelling the expected value. The variance is also 
a function of the number of tests and probability of success. This variance is small if p, is 



1.0- 


• 




1 


0.8- 




0.6- 




0.4- 




0.2- 






• 


0.0- 


* 1 1 



1 1 1 1 1 1 r 



0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 

Dose 



FIGURE 18.1 

Proportion of normal plants per tray plotted against dose of demethylation agent (Example 18.1A). 
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close to either zero or one, and increases to its maximum at p, = 0.5. This heterogeneity can 
be seen in Figure 18.1, where the observed proportions close to 0 and 1 show less variation 
than the proportions between 0.10 and 0.20 (for dose equal to 0.5). The variance can also be 
written in terms of fhe expecfed value as 



Var(j/,) = 

Mi 



illusfrafing fhaf fhe variance is a direcf funcfion of fhe expecfed value and fhe number of 
fesfs (?«;) for each unif. 



18.2.1 Understanding and Defining the Model 

To aid understanding, we introduce the GLM for Binomial responses with a single quan- 
titative explanatory variable (variate) using notation like that in the previous chapters. 
Later, we shall write models for qualitative variables (factors) or a mixture of factors, vari- 
ates and interactions. To make clear the distinction between the expected value of the data 
and its transformed value, we write g(p,) = rj,, so r|; represents the expected value of the fth 
observation after transformation by the link function. Note that this usage of q, which is 
standard notation for GLMs, is somewhat different from that in previous chapters. The 
systematic component of a model with a single explanatory variate is then 



h, = g(h,) = a + . 



so that, after transformation by the link function, the expected value of the fth observation 
is a straight line function of the explanatory variate, x,. Recall from Ghapter 12 that param- 
eter a is the intercept of this straight line and parameter [3 is the slope. As stated earlier, the 
right-hand side of this equation is called the linear predictor, and so this is often referred 
to as the model on the transformed or linear predictor scale. For now, we concern our- 
selves with the logit link function, which is the canonical link for Binomial data, so that 
our model with a single explanatory variate can be written as 



= loge 



\^m, - p,) 



a -I- p X, . 



We can rewrite the logit function in terms of the success probability, as 

r \ f \ f \ 

=l0ge 



loge 



= l0ge 



1 - 



= logit(p,) . 



This illustrates that the model above can equivalently be considered as 



q; = logit(p,) = a-i-px, , 



(18.1) 
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FIGURE 18.2 

Observations (•) with fitted GLM ( — , Binomial distribution and logit link) for explanatory variate Dose plotted 
on (a) natural and (b) logit scale (Example 18.1B). 



i.e. the logit of the success probability is a linear function of fhe explanafory variafe. 
The Binomial GLM wifh logif link is fherefore offen called logistic regression. The quan- 
tity p,/(l - p,) is known as the odds (in favour of success), so logif(p;) is equivalent to the 
logarithm of fhe odds, or log-odds. Hence, anofher inferprefafion of fhis model is fhaf fhe 
log-odds is a linear funcfion of fhe explanafory variafe. By rearranging Equafion 18.1, we 
can wrife fhe model in ferms of fhe success probabilify as 

exp(n,) exp(a-i-Bx,) 

P- = 1 ^ . /ox • (18.2) 

l-i-exp(r|,) 1 -I- exp(a -I- px,) 



This is offen called fhe model on fhe back-fransformed or natural scale. This model is a 
non-linear function of fhe explanafory variafe x (see Figure 18.2a). Given estimafes of fhe 
paramefers, fhis formula can be used fo predicf fhe success probabilify for any value of 
fhe explanafory variafe. If we mulfiply Equafion 18.2 by fhen we can wrife fhis model 
equivalenfly in ferms of fhe expecfed value as 



Pi = m,p, = m, 



exp(qi) 
l-t-exp(rii) ■ 



Puffing all of fhese properfies fogefher, we can inferpref fhe Binomial GLM wifh logif link 
as a non-linear model fhaf accounfs for fhe Binomial distribufion of fhe responses and 
ifs associafed heferogeneify. To give a symbolic form for a GLM, we exfend our previous 
definition to include the probability distribution and link function. This is illustrated in 
Example 18. IB. 

As stated above, parameter estimation is achieved by the method of maximum like- 
lihood, which is beyond fhe scope of fhis book. Here, we quofe resulfs obfained from 
GenSfaf rafher fhan deriving esfimafes direcfly. Once paramefer esfimafes have been 
obfained, fhe fiffed model should be checked for misspecificafion. You can achieve fhis by 
ploffing fhe observafions and fiffed values againsf fhe explanafory variable (see Secfion 
13.1) fo check fhaf fhe fiffed model follows fhe frend in fhe dafa. 
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EXAMPLE 18.1B: DEMETHYLATION EXPERIMENT 

For the demethylation experiment introduced in Example 18.1A, we can fit a Binomial 
GLM with logit link for the number of normal planfs in fhe ith tray (Normal,) in terms of 
the dose applied to that tray (Dose). The model can be written as 

Normalj ~ Binomial(Totol, ,p, ) ; ri;=logit(p, ) = a + pOose; , 



where p; is the probability that Dose, gives a normal phenotype, and r); is its logit trans- 
formation. We can write the model in an extension of our symbolic form as 



Response variable: 
Probabilify distribufion: 
Link function: 
Explanatory component: 



Normal 

Binomial (Number of fests = Total) 
logit 

[1] + Dose 



We have now included some additional information. First, as usual, we define the 
response variable, which is the variate containing the number of successes per unif 
(Normal). Then we specify the probability distribution of the response, here the 
Binomial distribution. For this particular distribution, we must also define the number 
of fests performed for each observation (here, the number of plants per tray. Total). We 
then specify fhe link funcfion, here fhe logit transformation. Finally, as usual, we give 
the explanatory component of the model in terms of the explanatory variables, here the 
intercept, [1], and the explanatory variate. Dose. 

^ We obtain the estimated parameters for fhis model from GenSfaf as a = 2.793 and 
P = -7.623, giving fhe fitted model on the scale of the linear predictor, i.e. the logit 
scale, as 



f|, = 2.793 - 7.623 x Dose, . 

On the natural scale, the fitted probability of a normal phenotype for the ith observation 
can be expressed as 



exp(f|,) _ exp(2.793 - 7.623 x Dose,) 

1 + exp(f|, ) 1 + exp(2.793 - 7.623 x Dose, ) 



Both forms are shown in Figure 18.2, with the observed proportion of normal plants 
(on the natural scale; Figure 18.2a) or the logit-transformed proportion (on the linear 
predictor scale; Figure 18.2b). 

This model is clearly misspecified (see Section 13.1), as the fitted lines deviate from fhe 
trend in the plot. This is clearer on the scale of fhe linear predicfor, where the trend in 
the data is evidently non-linear although the form of the GLM model demands a linear 
trend on this scale. This shortcoming can be tackled either with a different link function 
(see Section 18.2.7), with a polynomial function of fhe explanatory variate (see Section 
17.1.2), or by transformation of the explanatory variable (as in Section 17.1.1), which is the 
route we take here. 

We refit the model in terms of fhe log-transformed explanatory variate logDose = 
logfDose -t 0.1). The offset of 0.1 (see Section 6.2.1) is required to deal with the zero 
(control) dose, and has been chosen pragmatically (by inspection, using trial and error) 
to give a reasonable straight line on the linear predictor scale. The revised model takes 
the form 



f), = -3.188 - 3.148 X logDosCi , 
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logJ,Dose + 0.1) logj,(Dose + 0.1) 



FIGURE 18.3 

Observations (•) with fitted GLM ( — , Binomial distribution and logit link) for explanatory variate loge(Dose + 0.1) 
plotted on (a) natural and (b) logit scale (Example 18.1B). 



where logDosej = logg(Dose; + 0.1). On the natural scale, this gives 

_ exp('ni) _ exp(-3.188 - 3.148 x logDosCi) 

l + exp(fi, ) 1 + exp(-3. 188 - 3.148 X log-Dose,) 



The fitted model is plotted on both the natural and linear predictor scales in Figure 18.3, 
which shows the fit to be much closer to the observed trend in the responses. 

The estimated intercept (a = -3.188) and slope ((3 = -3.148) parameters relate to the 
straight line fitted on the linear predictor, or logit, scale. The negative slope indicates 
that the proportion of normal plants is smaller for larger doses. We can write the predic- 
tive model as a continuous function of the original explanatory variable Dose as 

fi(Dose) = -3.188 - 3.148 x loge(Dose + 0.1) . 

From this formula, we can make predictions for any dose (staying within the observed 
range to avoid extrapolation). For example, for Dose = 0.3, with loge(Dose + 0.1) = -0.92, 
the predicted response on the logit scale is 

fi(Dose = 0.3) = -3.188 - (3.148 x loge(Dose + 0.1)) = -3.188 - (3.148 x -0.92) = -0.304 . 



We can back-transform this prediction to estimate the probability of getting a normal 
phenotype at this dose as 



h(Dose = 0.3) = exp[fi(Dose = 0.3)] ^ exp(-0. 304) 

1 -H exp[f)(Dose = 0.3)] 1 + exp(-0. 304) 



0.738 

1.738 



0.425 . 



The estimated probability of our obtaining a normal plant after application of a dose of 
0.3 units is therefore equal to 0.425, i.e. a 42.5% chance of obtaining a normal plant. We 
can translate this to an expected number of normal plants per tray by multiplying by 
the number of plants in the tray. 

Having fitted a GLM that appears to give a reasonable description of the data in the 
fitted model plot, we must get some formal quantification of the fit and the uncertainty 
associated with the estimated parameters. These topics are discussed in the next sections. 
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18.2.2 Assessing the Importance of the Model and Individual Terms: 

The Analysis of Deviance 

In GLMs, the fit of a model is quantified by calculation of the deviance, which is a measure 
of the discrepancy between the fitted model and the data. This comparison is made via a 
function, called the log-likelihood function, which takes into account both the link trans- 
formation and the underlying distribution and compares the fit of the proposed model 
against a perfect or saturated model that fits each observation exactly. For a Binomial 
distribution, the deviance for a model with fitted values p, takes the form 



N N 



f=l 



1=1 L 



■ 


f 




y.loge 




Vi 


\ 


hJ 



{m, - y,)log, 



r \ 

m, - y, 



(18.3) 



The fit of a model is usually summarized in an analysis of deviance (ANODEV) table. The 
ANODEV table starts with the total deviance of the observations, obtained as the deviance 
of a null or baseline model that assumes that the expected value is equal for all observa- 
tions, i.e. T|, = ri for i = l ... N. This total deviance is partitioned into the change in deviance 
that occurs when the explanatory component is fitted, here called the model deviance 
(ModDev), and a remainder, the residual deviance (ResDev), which is the change in devi- 
ance between the fitted and saturated models given by Equation 18.3. Each component 
of the total deviance has degrees of freedom associated with it, and those for the residual 
deviance are denoted ResDE. The ANODEV table is similar in spirit to the ANOVA table 
used to summarize the fit of a linear model (see Chapters 4 and 12) and takes the general 
form shown in Table 18.2 for a model with p (independent) parameters. Eor the model with 
an intercept and a single explanatory variate, we have p = 2. Because the components of 
the deviance generally increase as their degrees of freedom increase, it is helpful to divide 
the contributions by their degrees of freedom to get mean deviances that are on a common 
scale. 



EXAMPLE 18.1C: DEMETHYLATION EXPERIMENT 

In Example 18.1B, we modelled the number of normal plants (Normal^ as a function of 
the log-dose of agent applied. The ANODEV table for this model is Table 18.3. 

The model deviance represents differences between the null model (with one param- 
eter representing the overall mean on the logit scale) and the fitted model, here a regres- 
sion on logged dose. The change in deviance between these two models is 1874.77, with 
1 df as one extra parameter has been added (the slope parameter). The residual devi- 
ance represents differences between the fitted model (in terms of logged dose) and the 
saturated model, which has an additional 22 parameters (to give 24 in total, one for each 
observation); the change in deviance here is much smaller. The total deviance represents 



TABLE 18.2 



ANODEV Table for a GLM with p Parameters and N Responses without Over-Dispersion 



Source of Variation 


df 


Deviance 


Mean Deviance 


P (Chi-Squared) 


Model 


p-1 


ModDev 


ModMDev = ModDev/ (p - 1) 


Prob(Xp-i > ModDev) 


Residual 


N-p 


ResDev 


ResMDev = ResDev /(N - p) 




Total 


N-1 


TotDev 
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TABLE 18.3 

ANODEV Table for the Demethylation Experiment with Explanatory Variate 
logDose = loge(Dose + 0.1) (Example 18.1C) 



Source of Variation 


df 


Deviance 


Mean Deviance 


P (Chi-Squared) 


Model 


1 


1874.772 


1874.772 


< 0.001 


Residual 


22 


26.623 


1.210 




Total 


23 


1901.395 







differences between the null and saturated models; the deviance and df from the model 
and residual contributions sum to the total values. 



The appropriate method for assessment of the model depends on whether there is evi- 
dence of over-dispersion, and so we consider fhis issue nexf. The residual deviance incor- 
porafes systemafic discrepancies befween the model and the observed responses, variation 
between replicate observations (observations on independent experimental units with the 
same values of fhe explanatory variables), and sampling variation arising from the distri- 
bution of fhe dafa (here, fhe Binomial distribufion). If fhere are no replicafe observafions 
and the fitted model provides an adequate description of the systematic trend, then only 
sampling variation contributes to the residual deviance. If this is true, then the residual 
deviance has an approximate chi-squared distribution (see Section 2.2.4) with df equal fo 
fhe residual df. The null hypothesis that the model adequately describes the responses can 
therefore be rejecfed af significance level if fhe residual deviance exceeds the 100(1 - ajth 
percentile of thaf chi-squared disfribufion. If fhis hypofhesis is rejecfed, if indicafes a poor 
fif of the model to the observations, which may happen for several reasons. First, the fit- 
ted model might not follow the observed patterns in the data (i.e. model misspecification), 
as illustrated in Figure 18.2. In this case, the explanatory variate(s) may be transformed 
fo fry and improve fhe fit (as in Example 18.1B), or an alternative link function might be 
considered. For example, the logit link function requires the shape of fhe curve (on fhe 
nafural scale) fo be symmefric around probabilify 0.5; one alfernafive, fhe complementary 
log-log link function, allows some asymmetry in this relationship and will give a better fit 
for some data sets. Second, the response may depend on explanatory variables that have 
not been included in the model; additional explanatory variables should be tested to see if 
fhey improve fhe model. Third, the assumed distribution might be incorrect. For example, 
the Binomial distribution requires that the individual tests that comprise each observation 
should be independent. If they are not, then the observed variation might not match that 
expected for fhe Binomial disfribufion, and fhis will be reflecfed in the residual deviance. 
Fourth, outliers or influential observations may have either distorted the model or inflated 
the residual deviance. These different circumstances can be investigated with the methods 
introduced in Chapter 13 and are discussed in Section 18.2.3. If replicafe observafions are 
present, then variation between replicates might inflate the residual deviance even if fhe 
model gives an adequafe fit to the data, but the checks outlined above should still be made. 

In general, the quality of the approximation to the chi-squared distribution for fhe 
residual deviance improves as the number of observafions increases; this is known as an 
asymptotic approximation. For Binomial data, the approximation improves as both the 
number of observafions, N, and the number of tesfs per observafion, increase. Flowever, 
fhe chi-squared approximafion does nof hold for binary dafa (i.e. m, = 1), and the approach 
for this situation is discussed in Section 18.2.7. 
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If the residual deviance is larger than expected when compared with critical values of 
the appropriate chi-squared distribution, and if this cannot be dealt with by changing the 
model, then there is more variation present than can be accounted for by the assumed 
probability distribution. In this case, we say that the data show over-dispersion. The sim- 
plest way to deal with over-dispersion is by extending the model to scale the variance 
function. In a Binomial distribution, the scaled variance takes the form 

Var(i/,) = cpm,p,(l - p,) = tp— (m, - p;) . 

m, 

The rationale for this approach is discussed by Collett (2002, Chapter 6). The parameter (p is 
a scaling factor, called the dispersion parameter, which is used to summarize the degree 
of over-dispersion present in the observations. Clearly, (p = 1 corresponds to the original 
model. This parameter can be estimated in several different ways. The deviance estimate 
of the dispersion is equal to the residual mean deviance (ResMDev), i.e. 

ip = ResDev/ResDF . 



The Pearson estimate of the dispersion is equal to Pearson's chi-squared (goodness-of-fit) 
statistic divided by the residual df. 



1 Y (y. - hO 1 Y (y. - m,p,) 

ResDF Var(|l,) ResDF rnffl - pi) 



(18.4) 



where Var(|i,) is the variance function associated with the probability distribution (with 
(p = 1), evaluated at the estimated expected value for the fth observation. The default 
method for estimation of the dispersion parameter varies between statistical packages. 
Either of these parameter estimates can be used to give a more realistic assessment of the 
contributions of explanatory variables in the ANODEV table, and to inflate the estimated 
SEs of parameters to reflect the observed variation. However, estimation of the dispersion 
parameter changes the way that contributions to the ANODEV table should be evaluated. 
We must therefore establish whether over-dispersion is present before attempting to inter- 
pret the ANODEV table. 

EXAMPLE 18.1D: DEMETHYLATION EXPERIMENT 

In the ANODEV table for the model with log-dose of the demethylation agent (Table 
18.3), the residual deviance takes the value 26.62 on 22 df, with P = 0.226 when com- 
pared to the chi-squared distribution on 22 df. There is therefore no evidence of over- 
dispersion for this model. 



18.2.2.1 Interpreting the ANODEV with No Over-Dispersion 

If there is no over-dispersion present, then the model and residual deviance contributions 
approximately follow chi-squared distributions with degrees of freedom equal to the df 
for each contribution. We can use the model deviance to test whether the inclusion of the 
explanatory component has improved the fit when compared with the null model. The 
null hypothesis is that the response is not related to the explanatory component. Eor a 
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model with a single explanatory variate, as in Example 18.1, the null hypothesis is equiva- 
lent to Hg: P = 0. If the model deviance, i.e. ModDev, is larger than the 100(1 - ajth percen- 
tile of fhe chi-squared disfribufion wifh degrees of freedom equal fo fhe model df, fhen 
fhis null hypofhesis can be rejecfed af significance level a^, indicafing fhaf fhe explanafory 
componenf has improved fhe fif compared wifh fhe null model. 

EXAMPLE 18.1E: DEMETHYLATION EXPERIMENT 

In Example 18.1D, there was no evidence of over-dispersion for the model with log-dose 
of the demethylation agent. In the ANODEV table (Example 18.1C, Table 18.3), the model 
deviance represents the change on addition of the logDose explanatory variate into the 
model. This deviance takes the value 1874.77 with 1 df, with P < 0.001 when compared 
with the chi-squared distribution with 1 df. This test gives strong evidence that the pro- 
portion of normal phenotypes is related to the logged dose of the agent. 



18.2.2.2 Interpreting the ANODEV with Over- Dispersion 

If over-dispersion is presenf, fhen we expecf all fhe componenfs of deviance fo be inflafed, 
and so cannof compare fhem direcfly wifh a chi-squared disfribufion. Insfead, we follow 
an approach similar fo fhaf faken in an ANOVA fable (Chapfers 4 and 12). The deviance 
confribufions are divided by fheir degrees of freedom fo gef mean deviances fhaf are on a 
common scale (analogous fo fhe mean squares in ANOVA). The rafio of fhe model mean 
deviance (i.e. ModMDev) fo fhe residual mean deviance (ResMDev) can fhen be used 
fo assess whefher fhe explanafory variable(s) have improved fhe fif compared wifh fhe 
null model. This infroduces a new column of deviance rafios info fhe ANODEV fable (see 
Table 18.4). Under fhe null hypofhesis fhaf fhe response is nof relafed fo fhe explanafory 
variable(s), fhe deviance rafio 



„ ModMDev 

F = 

ResMDev 

has an approximafe F-disfribufion, wifh numerafor df equal fo fhe model df (ModDF) and 
denominator df equal fo fhe residual df (ResDF). 

EXAMPLE 18.2A: LADYBIRD PREDATION 

An experiment was done to investigate factors affecting predation by the Harlequin 
ladybird. Ladybirds of known sex (factor Sex, with levels 1 = female and 2 = male) were 
put individually into dishes containing six items of prey, which were either pea aphids 
or lacewing larvae (factor Prey, with levels 1 = aphid and 2 = lacewing). The experiment 
was designed as a RCBD with four rows (blocks) of four Petri dishes (one per treatment 



TABLE 18.4 

ANODEV Table for a GLM with p Parameters and N Responses with Over-Dispersion 



Source of 
Variation 


df 


Deviance 


Mean 

Deviance 


Deviance Ratio 


P(F) 


Model 


p-1 


ModDev 


ModMDev 


F = ModMDev /ResMDev 


Prob(F^i > F) 


Residual 


N-p 


ResDev 


ResMDev 






Total 


N-1 


TotDev 
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combination) and was repeated on four occasions, although on one occasion only three 
rows could be completed, because of a shortage of lacewing larvae. The number of 
whole prey eaten after 60 min was counted within each dish (variate Eaten) and could 
be reasonably assumed to have a Binomial distribution. The final data set for analysis 
consisted of 60 observations (15 rows each with four treatment combinations), given in 
file PREY.DAT and in Table 18.5. For simplicity, here we do not distinguish between occa- 
sions and label the rows as 1 ... 15 (factor Row), combining the variation due to occa- 
sions and rows (within occasions) into a single term. 

We wish to fit a GLM with Binomial distribution and logit link. Unfortunately, it is 
not possible to account properly for the structural component within the standard GLM 
framework, as there is no parallel to the multi-stratum ANOVA. As discussed in Section 
15.3, we therefore have to either use a different method (see Section 18.4) or take an 
approximate approach by combining the explanatory and structural components. For a 
RCBD, treatment effects are estimated via within-block comparisons and an intra-block 
analysis allows us to exclude block (row) effects before we assess treatment terms, and 
so we take this approach. We use a two-way crossed structure (Section 8.2) to model the 
four treatments. This model can be written in mathematical form as 

Eatenjrs ~ Binomial(6,p,>s); B/rs = logit{pirs) = t|iii + RoWi + SeXr + Preys + (Sex.Prey)rs , 

where Eatenjrs is the number of prey eaten in the ith row (1 = 1 ... 15) by the rth sex (r = 1, 
2 for 1 = female and 2 = male) with the sth prey type {s = l,2 for 1 = aphid, 2 = lacewing 



TABLE 18.5 



Number of Prey Eaten by the Harlequin Ladybird (Example 18.2A and Eile prey.dat) in an 
Experiment with 15 Rows of Eour Dishes, Each Containing One Ladybird (Eemale, F, or Male, M) 
and Six Items of Prey (Pea Aphids, A, or Lacewing Larvae, L) 



Row 


Dish 


Sex 


Prey 


Eaten 


Row 


Dish 


Sex 


Prey 


Eaten 


Row 


Dish 


Sex 


Prey 


Eaten 


1 


1 


F 


A 


5 


6 


1 


M 


L 


1 


11 


1 


M 


L 


0 


1 


2 


M 


A 


2 


6 


2 


F 


L 


4 


11 


2 


M 


A 


0 


1 


3 


F 


L 


3 


6 


3 


F 


A 


0 


11 


3 


F 


L 


2 


1 


4 


M 


L 


0 


6 


4 


M 


A 


0 


11 


4 


F 


A 


2 


2 


1 


F 


A 


5 


7 


1 


M 


A 


0 


12 


1 


M 


A 


2 


2 


2 


M 


A 


2 


7 


2 


M 


L 


2 


12 


2 


F 


L 


0 


2 


3 


F 


L 


1 


7 


3 


F 


L 


2 


12 


3 


M 


L 


1 


2 


4 


M 


L 


1 


7 


4 


F 


A 


4 


12 


4 


F 


A 


4 


3 


1 


F 


A 


3 


8 


1 


M 


A 


3 


13 


1 


M 


A 


0 


3 


2 


F 


L 


0 


8 


2 


M 


L 


2 


13 


2 


M 


L 


0 


3 


3 


M 


A 


0 


8 


3 


F 


L 


5 


13 


3 


F 


A 


2 


3 


4 


M 


L 


0 


8 


4 


F 


A 


3 


13 


4 


F 


L 


0 


4 


1 


M 


L 


1 


9 


1 


F 


L 


1 


14 


1 


M 


L 


2 


4 


2 


M 


A 


1 


9 


2 


M 


L 


0 


14 


2 


F 


L 


3 


4 


3 


F 


A 


4 


9 


3 


F 


A 


0 


14 


3 


F 


A 


2 


4 


4 


F 


L 


2 


9 


4 


M 


A 


1 


14 


4 


M 


A 


1 


5 


1 


M 


A 


2 


10 


1 


F 


A 


4 


15 


1 


F 


A 


2 


5 


2 


F 


A 


1 


10 


2 


M 


A 


0 


15 


2 


F 


L 


1 


5 


3 


M 


L 


0 


10 


3 


F 


L 


0 


15 


3 


M 


A 


0 


5 


4 


F 


L 


4 


10 


4 


M 


L 


0 


15 


4 


M 


L 


0 



Source: Data from P. Wells, Rothamsted Research. 
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larvae), and is the probability that an item of prey in this category is eaten, with 
logit transformation We use first-level-zero parameterization (see Section 11.2.1), so 
Row^ = 0, Sexi = 0, Prey-i = 0 and {Sex.Prey)^^ = 0 for r = 1 or s = 1. Then, rim is the logit of 
the probability for the first level of all factors (i.e. females in the first row with aphids), 
RoWj is the relative effect of the ith row, Sex 2 is the difference in response between males 
and females (for aphid prey), Prej/j is the difference in response between lacewing larvae 
and aphids (for females) and (Sex.Prey )22 is fhe interaction effect, i.e. the additional differ- 
ence for the combination of a male ladybird with lacewing larvae. The explanatory terms 
are the structural factor plus the two treatment factors and their interaction. On the logit 
scale, the model fits a separate effect for each sex x prey combination and allows a shift 
in the value for each row. This model is written with symbolic notation as 



Response variable; 
Probability distribution: 
Link function: 
Explanatory component: 



Eaten 

Binomial (Number of tests = 6) 
logit 

[1] + Row + Sex*Prey 



The ANODEV table for this model is Table 18.6. The residual deviance is 69.66 with 
42 df with P = 0.005 (compared to a chi-squared distribution with 42 df). There is there- 
fore evidence of over-dispersion for this model. We first consider whether we can deal 
with this by changing the model. As this is a designed experiment where we have fit- 
ted effects for each row and each treatment combination, and there are no additional 
explanatory variables, we cannot identify any deficiency in the model that might be 
corrected. We might attribute the over-dispersion to variation between the behaviour of 
individual ladybirds, but we cannot usefully account for this within a simple model. We 
therefore include a dispersion paramefer to model the over-dispersion, here estimated 
as the residual mean deviance. 



ResDev 

ResDF 



69.659 

42 



1.659 . 



The model deviance represents the change in deviance when all of fhe explanatory 
terms are added into the model, with 17 df: 14 df for the 15 row effects (term Row) and 3 
df for the crossed structure Sex*Prey (four treafment combinations). To assess whether 
this model explains any variation in the response, we use the ratio of the mean devi- 
ance for the model (4.214) with the residual mean deviance, to get 2.54 (= 4.214/1.659). We 
compare this deviance ratio to an F-distribution with 17 and 42 df, giving observed sig- 
nificance level P = 0.007, and so conclude that there is statistical evidence that the model 
explains some of the patterns in predation. We investigate the importance of individual 
model terms in Example 18.2B. 

A good strategy for analysis is to fit an initial model with the dispersion parameter 
set equal to one, assess the quality of the fit (see Section 18.2.3) and, when the fit appears 



TABLE 18.6 

ANODEV Table for the Ladybird Predation Experiment (Example 18. 2A) 



Source of 
Variation 


df 


Deviance 


Mean 

Deviance 


Deviance 

Ratio 


P(F) 


Model 


17 


71.635 


4.214 


2.54 


0.007 


Residual 


42 


69.659 


1.659 






Total 


59 


141.294 
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adequate, to formally test whether the dispersion parameter is equal to one as shown 
above. Remember that this test is reliable only when the residual df is reasonably large 
and, for Binomial dafa, when fhe number of fesfs per observafion {m) is nof too small. If 
fhere is evidence fhaf fhe dispersion parameter is larger fhan 1, fhen over-dispersion is 
presenf and fhe analysis should proceed accordingly. 

Occasionally, fhe dispersion paramefer mighf appear to be subsfanfially less fhan 1 and 
fhen under-dispersion should be considered as a possibilify. Under-dispersion occurs 
where, for a given disfribufion, we defecf less variafion fhan expected, wifh cp < 1. This is 
less common fhan over-dispersion and is offen difficulf to inferpref or explain, and if is 
sensible to be wary in fhis sifuafion. If fhe dispersion paramefer is esfimafed as smaller 
fhan 1 when if is in facf equal fo 1, fhen fhe significance of hypofhesis fesfs will be inflafed 
and esfimafed SEs will be foo small. To avoid fhese problems, leave fhe dispersion param- 
efer equal fo 1 in cases of apparenf under-dispersion. 



18.2.2.3 The Sequential ANODEV Table 

If fhe explanafory componenf consisfs of several differenf model terms, fhen we can cal- 
culate a sef of incremenfal deviances (and df) from fhe change in deviance (and df) fhaf 
occur on successive addifion of individual terms info fhe model, producing a sequenfial 
ANODEV fable analogous fo fhe sequenfial ANOVA fables infroduced in Secfions 11.2.2 
and 15.4.1. If fhere is no evidence of over-dispersion, fhen fhe incremenfal deviance is 
compared wifh a chi-squared disfribufion wifh df equal fo fhe incremenfal df obfained 
on addifion of fhe ferm info fhe model. If over-dispersion is presenf, fhen fhe deviance 
rafio for fhe ferm (incremenfal deviance divided by fhe incremenfal df, all divided by fhe 
residual mean deviance) is compared fo an E-disfribufion, as illusfrafed in Example 18. 2B. 
Because of fhe non-linear nafure of fhe GEM, terms fhaf would be orfhogonal in a linear 
model (Secfion 11.1) will nof be orfhogonal in a GLM, i.e. fhe sequenfial deviance for a ferm 
in an ANODEV fable depends on fhe order in which fhaf ferm is added info fhe model, as 
illusfrafed in Example 18.2B. We can also consfrucf a sef of marginal deviances by calculaf- 
ing fhe change when a ferm is dropped from fhe model (c.f. Secfions 11.2.3 and 15.4.2). In 
general, we follow fhe sfrafegies for model selecfion ouflined in Secfion 15.5.1 fo obfain a 
predicfive model. 



EXAMPLE 18.2B: LADYBIRD PREDATION 

Two sequential ANODEV tables for the ladybird predation experiment are shown in 
Table 18.7. In both sequences, we fit the structural factor Row first, followed by the 
explanatory crossed structure, which we fit as Sex*Prey in Table 18.7a and as Prey*Sex 
in Table 18.7b. First, we consider the former case. As we identified over-dispersion in 
Example 18.2A, individual model terms are assessed on their deviance ratios, with 
ResMDev = 1.659. Using similar notation for incremental deviances as that developed 
for incremental sums of squares earlier (Sections 11.2 and 15.4), we calculate the incre- 
mental deviance ratio for factor Sex as 

ps ^ Dev(+Sex|[1])/df(+Sex|[1]) ^ (33.471/1) ^ i8l 
ResMDev 1.659 



The numerator df for the F-statistic are the incremental df for the term added into the 
model and the denominator df are the residual df. Deviance ratios for other terms are 
calculated similarly from their incremental deviances and df. 
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TABLE 18.7 



Two Sequential ANODEV Tables (Deviance Not Shown) for the Ladybird Predation Experiment 
with Explanatory Eactors Row, Sex and Prey (Example 18.2B) 



(a) 

Source of 
Variation 


df 


Mean 

Deviance 


Deviance 

Ratio 


P 

(F) 


(b) 

Source of 
Variation 


df 


Mean 

Deviance 


Deviance 

Ratio 


P 

(F) 


+ Row 


14 


2.34 


pR = 1.41 


0.191 


+ Row 


14 


2.34 


pR = 1.41 


0.191 


+ Sex 


1 


33.47 


ps = 20.18 


< 0.001 


+ Prey 


1 


4.62 


pp = 2.79 


0.103 


+ Prey 


1 


5.15 


pP = 3.10 


0.085 


+ Sex 


1 


34.00 


ps = 20.50 


< 0.001 


+ Sex. Prey 


1 


0.27 


psp = 0.16 


0.690 


+ Prey.Sex 


1 


0.27 


pPs = 0.16 


0.690 


Residual 


42 


1.66 






Residual 


42 


1.66 






Total 


59 








Total 


59 









Our aim is to identify a parsimonious predictive model, so we progressively drop 
terms while respecting marginality (see Section 15.5). We therefore start by consider- 
ing the interaction term, which is not significant (Fi®i 2 = 0.16,F = 0.690). This test is the 
same in both sequential ANODEV tables because the term is fitted last in both cases. 
As the interaction is not significant, we can try to simplify the model further. We have 
many residual df (ResDE = 42) and there are only these two sequential ANODEV tables, 
so we can identify the predictive model from them. We therefore inspect the two main 
effects. In a linear model with this structure, factors Sex and Prey would be orthogonal, 
but Table 18.7 illustrates that this is not the case here, although the two tables are simi- 
lar. We find that factor Sex is statistically significant whether it is fitted before or after 
factor Prey (P < 0.001 in both cases), and that factor Prey is not statistically significant 
in either sequence (P > 0.085). As factor Row represents a structural term, we do not 
consider removing it from the model (see Section 15.5). Our predictive model therefore 
takes the form 

Explanatory component: [1] + Row + Sex 

Eitting this model leads to a residual mean deviance of 1.706 (with 44 df), and an 
observed F-statistic for the Sex main effect of F ®44 = 19.617 (P < 0.001). So this experi- 
ment gives strong evidence that the number of prey eaten by male and female ladybirds 
differ, but no evidence of any preference between the two prey types. We explore this 
difference between male and female ladybirds further in Examples 18.2D and 18.2E. 



18.2.3 Checking the Model Fit and Assumptions 

The first step of model checking consists of plotting the fitted model with the observed 
data. Figure 18.2 demonstrated that, for a model with a single explanatory variate, prob- 
lems with model fit may be highlighted by plots of the fitted model on the scale of the lin- 
ear predictor, where a straight line is expected. The residual plots described in Chapters 
5 and 13 can also be used to give more information on the model fit, but the definition 
of the residuals needs to be extended for GLMs, and several methods are available. As 
previously, simple residuals can be defined as the difference between the observation, i/„ 
and its fitted value, i.e. y, - p,. However, these residuals are subject to the same heteroge- 
neity as the observations, and so are usually divided by the square root of the estimated 



Models for Non-Normal Responses 



495 



variance of the distribution, Var(|i,), to give the set of Pearson residuals, defined for the 
ith observation as 



£pi - 



Vi Ai 

VVar(|i,) 



These residuals are called Pearson residuals as the sum of their squared values is equal 
to the Pearson goodness-of-fit statistic defined in Equation 18.4. Although we have now 
adjusted for heterogeneity of variance caused by the distribution of the observations, we 
still have to account for heterogeneity due to uncertainty in the predicted values (as in 
Section 13.2) and so we standardize the Pearson residuals by dividing them by their esti- 
mated SEs. 

An alternative set of residuals are constructed as the square root of the contribution 
that each observation makes to the deviance (D, defined in Equation 18.3) multiplied by 
the sign of the simple residuals; these are called the deviance residuals. Eor the Binomial 
distribution, the deviance residuals are calculated for the fth observation as 



Cd; = sign(y, - p,)D, 



sign(y, 




2y,Toge 






+ 2(m, 



yOloge 



r \ 

m, - Pi 

- A;, 




The sum of the squared values of these deviance residuals is equal to the residual deviance 
of the fitted model given in Equation 18.3. These residuals must also be standardized to 
give a common variance for diagnostic plots. Both the standardized deviance and stan- 
dardized Pearson residuals can be generalized to prediction and deletion residuals via 
the same Teave-one-out' technique used to derive these residuals in the Normal case (see 
Section 13.2). 

Because the underlying probability distribution assumed for the observations is not 
Normal, we do not necessarily expect the residuals to conform to a Normal distribution. 
However, with a few exceptions, the standardized deviance residuals have been shown 
to give a reasonable approximation to a Normal distribution. Eor the case of a Binomial 
distribution, the exception is when the number of tests per observation, is small. In gen- 
eral, the distribution of the standardized Pearson residuals may be less close to a Normal 
distribution, and Collett (2002) shows some examples for Binomial data. The standardized 
deviance residuals can therefore be considered analogous to the standardized residuals 
discussed in Chapters 5 and 13, and are appropriate for use in the residual plots described 
in those chapters. 



EXAMPLE 18.2C: LADYBIRD PREDATION 

Figure 18.4 shows a composite set of residual plots with standardized deviance residu- 
als from the predictive model that describes numbers of prey eaten in terms of the 
factors Row and Sex (see Example 18.2B). Six diagonal stripes can be seen in the fitted 
values plot, running downwards from the left-hand side to the right-hand side of the 
graph. These stripes correspond to the six distinct observed responses (0, 1 ... 5), and 
this type of pattern is likely to be found in any data set with a small number of discrete 
responses. There appears to be a little more variation in the centre of the range, but we 
judge the fitted values plot to be acceptable given the small Binomial total (six) per dish. 
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Std deviance residual Normal quantile 



FIGURE 18.4 

Composite set of residual plots based on standardized deviance residuals for the ladybird predation experi- 
ment (Example 18.2C). 



The histogram and Normal probability plots suggest that the residuals give a reason- 
able approximation to a Normal distribution. These graphs therefore indicate no large 
discrepancies between the assumed model and the observed data. 



18.2.4 Properties of the Model Parameters 

As in the linear models seen previously, each parameter estimate in a GLM has an esti- 
mated SE that can be used for inference. The derivation of these SEs is beyond the scope 
of this book, but note that they are approximate and that they must include the multiplier 
i.e. the square root of the dispersion parameter, if this is estimated. If this multiplier is 
not used when over-dispersion is present, then the SEs under-estimate the uncertainty in 
the parameter values and this could lead to incorrect conclusions. 

The decision on whether a term should be included in a model should be based on the 
sequential ANODEV table(s). A null hypothesis that a particular parameter is equal to 
zero can be tested by the parameter estimate divided by its SE, but remember that the 
interpretation and value of parameters associated with terms containing factors will 
depend on the parameterization of the model. Statistical software usually uses first- or 
last-level-zero constraints for GLMs (see Sections 4.5, 11.2 and 15.2 for further details). 
If there is no over-dispersion, then the ratio of a parameter to its SE has an approximate 
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Normal distribution. If the dispersion parameter is estimated, then this ratio has an 
approximate t-distribution with degrees of freedom equal fo fhe residual df. If fhe abso- 
lufe value of fhe rafio exceeds fhe 100(1 - as/2)fh percenfile of fhe appropriafe disfri- 
bufion, fhen fhe null hypofhesis fhaf fhe paramefer is equal fo zero can be rejecfed af 
significance level a^. The SEs can be used fo consfrucf approximafe CIs for paramefer 
values in fhe usual manner. 

EXAMPLE 18.2D: LADYBIRD PREDATION 

The predictive model fitted in Example 18.2B can be written in mathematical form with 
first-level-zero parameterization as 

q,> = logit(p,>) = fill + Rowi + Sexr , 

where is the predicted probability that an item of prey is eaten in the ith row (i = 1 ... 

15) for the rth sex (r = 1, 2; 1 = female and 2 = male) with logit transformation f)„. Then, 
fill is the logit of the expected value for females in the first row, Rozvi is the relative 
effect of the rth row and Sexi is the difference in response between males and females 
on the logit scale. Table 18.8 shows the estimated parameters for this model with their 
estimated SEs. 

The estimated effect of male ladybirds is Sexi = -1.550 (SE 0.3712), which indicates 
that males tended to eat less prey than females. 



TABLE 18.8 

Parameter Estimates (First-Level-Zero Parameterization) with Standard 
Errors (SE), t-Statistics (t) and Observed Significance Level (P), for the 
Ladybird Predation Experiment with Explanatory Factors Row 
(15 Levels) and Sex (Two Levels, 1 = Female, 2 = Male) (Example 18.2D) 



Term 


Parameter 


Estimate 


SE 


t 


P 


[1] 


hii 


0.386 


0.6013 


0.642 


0.524 


Row 1 


RoWi 


0 


— 


— 


— 


Row 2 


R 0 W 2 


-0.200 


0.8259 


-0.242 


0.810 


Row 3 


Row^ 


-1.774 


1.0117 


-1.754 


0.086 


Row 4 


RoWf 


-0.408 


0.8354 


-0.488 


0.628 


Row 5 


Roiv^ 


-0.626 


0.8496 


-0.737 


0.465 


Row 6 


Row^ 


-1.121 


0.8983 


-1.247 


0.219 


Row 7 


RoWj 


-0.408 


0.8356 


-0.488 


0.628 


Row 8 


RoWg 


0.582 


0.8171 


0.712 


0.480 


Row 9 


ROWg 


-2.246 


1.1303 


-1.987 


0.053 


Row 10 


ROW^g 


-1.417 


0.9420 


-1.504 


0.140 


Row 11 


RoWn 


-1.417 


0.9414 


-1.505 


0.139 


Row 12 


R 0 W 12 


-0.626 


0.8492 


-0.737 


0.465 


Row 13 


ROWig 


-2.246 


1.1387 


-1.973 


0.055 


Row 14 


ROlUgji 


-0.408 


0.8355 


-0.488 


0.628 


Row 15 


Row^g 


-1.774 


1.0119 


-1.753 


0.086 


Sex 1 


SeXg 


0 


— 


— 


— 


Sex 2 


Sex2 


-1.550 


0.3712 


-4.176 


< 0.001 
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18.2.5 Evaluating the Response to Explanatory Variables: Prediction 

In general, examination of estimated parameters from fhe predictive model has limited 
scope, as it is usually the overall response to explanatory variables that is of interest. The 
presence of fhe link transformation makes prediction for GLMs more complex than for 
linear models, although the issues that arise are similar to those for the presentation of 
results following analysis of transformed data (see Section 6.3). 

Prediction on the linear predictor scale is straightforward, as on this scale the model 
is linear and the estimated SE usually gives a good approximation of the uncertainty 
associated with the predicted value. Prediction for a specific combination of explanatory 
variables can be made on the linear predictor scale, then a Cl can be generated from the 
Normal distribution (no over-dispersion) or t-distribution (over-dispersion present), and 
the prediction and its confidence limifs can be back-transformed to the natural scale. While 
software will calculate SEs on the natural scale (via the delta method), these SEs tend to be 
much less accurate than those calculated on the linear predictor scale because they make 
an additional set of approximating assumptions. Back-transformed CIs fherefore tend to 
give a better measure of uncertainty than these approximate SEs. 

Eurther complications arise when averages over variables are required, or where the 
main objective of the study is comparison between groups, or both. 

Averaging over variables is required for predictions for a subset of the explanatory 
variables. The usual procedure is to form predicted values for all combinations of the 
explanatory variables, i.e. at specified values of variates and all levels of factors. In a 
linear model, predictions for the variables of interest are then obtained as averages over 
the remaining variables (Section 15.5.2). In a GEM, we must also consider back-transfor- 
mation to the natural scale and this leads to two possibilities, either averaging before 
back-transformation or averaging afterwards, and these two strategies will give differ- 
ent numerical results with different interpretations. This situation is discussed in the 
context of analysis of transformed data by Morris (1985) and illustrated in Example 18. 2E. 
Averaging before back-transformation can be interpreted as making a prediction at an 
average value of fhe remaining variables. This gives individual predictions with SEs 
on the linear predictor scale, CIs can be formed for each prediction, and these CIs can 
be back-transformed to give a realistic measure of uncertainty on the natural scale. If 
instead the full set of predictions is back-transformed before averaging, this is analogous 
to predicting an average response on the natural scale for an experiment in which the 
predicted combination was applied with each combination of levels of the remaining 
variables. Unfortunately, only approximate SEs on the natural scale can be calculated for 
fhis type of prediction. 

We now consider comparison between specific combinations of explanatory variables. 
Comparisons can easily be made on the linear predictor scale, with appropriate SEs, and 
so this is the scale on which you should test such comparisons. However, interpretation 
of comparisons on the natural scale can be difficult. We illustrate this problem using an 
experiment with a set of t treatment groups. We label the transform of the expected value 
for the ;th group on the linear predictor scale as Py, and are interested in the quantity 
r|y - rij., with predicted value fjy - fj/c. Ideally, as in the case of individual predictions, we 
should like to take a Cl for this quantity and map it on to a meaningful quantity on the 
natural scale. This can be done for the log link function (see Section 18.3.1), since 



fj, - Tjfc = loge(|i,) - loge(Ar) = log, 



V Afc , 
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so the comparison on the log scale is the log of the ratio of fhe predicfions on fhe nafural 
scale, and fhis rafio can be back-fransformed and inferprefed (see Example 18.3). For a logif 
link funcfion, we find fhaf 



fi; - fit = logif(py) - logif(pt) = loge 



Jk/0--pk), 



so fhe comparison on the logit scale is the log of the odds-ratio of the predictions on the 
natural scale. Unfortunately, the odds-ratio is rather less interpretable. In general, if fhe 
quanfify of inferesf is fhe difference in expecfed values on fhe nafural scale, i.e. |i, - jit/ 
fhen there is no real alternative to back-transforming predicfions and using fhe approxi- 
mafe SE calculafed on fhe nafural scale. Wifh link funcfion g(), fhe difference is fhen esfi- 
mafed as 



Ai - A, = g“'(f|;) - g'^hy) • 

EXAMPLE 18.2E: LADYBIRD PREDATION 

We established a predictive model in Example 18.2D, and now we want to understand 
how an estimated decrease for males of 1.55 units on the logit scale translates into num- 
ber of prey eaten. Table 18.9 lists the full set of predictions and the back-transformed 
proportions. The predictions for male ladybirds are 1.55 units smaller than for female 
ladybirds in the same row on the logit scale, but the same difference varies between 0.10 
(rows 9 and 13) and 0.37 (row 8) once back-transformed. 



TABLE 18.9 



Predictions for Ladybird Predation on Linear Predictor Scale and Back- 
Transformed as Probabilities for Each Sex in Each Row (Example 18.2D) 





Linear Predictor Scale (Logit) 


Back-Transformed (Fitted Probability) 


Row 


Female 


Male 


Female 


Male 


1 


0.386 


-1.164 


0.595 


0.238 


2 


0.186 


-1.364 


0.546 


0.204 


3 


-1.388 


-2.934 


0.200 


0.050 


4 


-0.021 


-1.571 


0.495 


0.172 


5 


-0.240 


-1.790 


0.440 


0.143 


6 


-0.734 


-2.284 


0.324 


0.092 


7 


-0.021 


-1.571 


0.495 


0.172 


8 


0.968 


-0.582 


0.725 


0.359 


9 


-1.860 


-3.410 


0.135 


0.032 


10 


-1.031 


-2.581 


0.263 


0.070 


11 


-1.031 


-2.581 


0.263 


0.070 


12 


-0.240 


-1.790 


0.440 


0.143 


13 


-1.860 


-3.410 


0.135 


0.032 


14 


-0.021 


-1.571 


0.495 


0.172 


15 


-1.388 


-2.938 


0.200 


0.050 
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To predict the difference in number of prey eaten between male and female ladybirds 
in an average row, we take the average of the predictions on the logit scale as 

f|„ = — y fi„; fi.i = -0.553 (SE 0.2196) , fi.j = -2.103 (SE 0.3128) . 

15 " 

1=1 

We can construct 95% Cls for these predictions on the logit scale as (-0.996, -0.110) for 
females and as (-2.733, -1.473) for males. We back-transform these estimates and Cl, 
and estimate the probability of an item of prey in an average row being eaten by female 
ladybirds as 0.37 with 95% Cl (0.27, 0.47), and by male ladybirds as 0.11 with 95% Cl (0.06, 
0.19). Note the asymmetry of the Cl for the male ladybirds. Approximate SEs can be 
calculated directly for these back-transformed predictions as 0.051 and 0.030 for female 
and male ladybirds, respectively, so the approximation is better for the female than for 
the male ladybirds. 

To predict the average difference in number of prey eaten between male and female 
ladybirds across the whole experiment, we take the average of the back-transformed 
predictions, as 



P-r=^'^pir for r = 1,2, 

giving a predicted average proportion of prey eaten of 0.38 (approximate SE 0.044) for 
females and 0.13 (approximate SE 0.032) for males. In this example, these quantities dif- 
fer only a little from those averaged on the linear predictor scale. In general, the appro- 
priate scale for prediction will depend on the context of the study. 



18.2.6 Aggregating Binomial Responses 

It is not always clear how Binomial responses should be recorded. For example, consider 
an experiment looking at the prevalence of pests on different varieties within an orchard, 
where four individual branches are assessed as clean or infesfed on six frees of each vari- 
efy. The invesfigafor mighf wonder whefher fo record fhe resulfs as binary scores (0 or 1) 
for each branch, as fhe number of infesfed branches per free (ouf of 4), or as fhe number of 
infesfed branches per variefy (ouf of 24)? As long as we fif fhe same explanatory compo- 
nenf, we obfain fhe same parameter esfimafes af any of fhese scales, buf we shall obfain a 
differenf residual deviance. As a rule of fhumb, we suggesf fhaf fhe appropriate scale for 
analysis (and hence fhe minimum scale for recording measuremenfs) is fhe smallesf exper- 
imenfal unif presenf in fhe sfudy (see Secfion 3.1), as fhis avoids fhe issues wifh binary dafa 
described in fhe nexf secfion. In our orchard example, fhis would be fhe individual free, 
as fhe variefy changes befween buf nof wifhin frees. The residual deviance fhen reflecfs 
expecfed free fo free variafion in fhe underlying suscepfibilify fo disease in addifion fo 
Binomial sampling variafion. A deviance larger fhan fhaf expecfed for Binomial samples 
indicates fhaf such variafion is presenf, and fhis can be accounted for by fhe dispersion 
parameter tp. 

In some circumsfances, if can help fo aggregafe Binomial observafions furfher, fo give a 
single response for each of fhe sfudy condifions, i.e. for each combinafion of explanafory 
variables, or group, presenf. If is appropriate fo do fhis only when replicate observafions 
are obfained under uniform condifions and no sysfemafic differences befween fhem are 
expecfed. The residual deviance can fhen be used fo assess fhe fif, and any indicafion of 
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over-dispersion indicates lack of fit in the model (as discussed in Section 18.2.2). This is 
useful only when the df associated with the model is smaller than the number of groups, 
and relies heavily on the assumption of a Binomial distribution to derive the sampling 
variance. 



18.2.7 The Special Case of Binary Data 

Binary responses, also known as a Bernoulli data, are a special case of Binomial data with 
only one test per observation (i.e. m, = 1), so that the observations can take only the values 
0 or 1. Analysis follows the same procedure as for other Binomial responses, but not all of 
the results discussed above are valid for binary data. In particular, the residual deviance 
does not give a reliable measure of over- or under-dispersion, and so the use of an esti- 
mated dispersion parameter is not recommended. The Pearson and deviance residuals are 
uninformative as they will not be distributed as an approximate Normal distribution, and 
a fitted values plot will often show strong patterns, even if the model is adequate. 

From a practical point of view, it is better to avoid binary observations whenever pos- 
sible, as they provide very little information per observation. One way of doing this is to 
take several independent replicate observations on each unit. For example, if the aim of an 
experiment is to assess disease incidence in a field trial then a binary assessment of each 
plot for presence or absence of the disease will be quick, but gives little information on the 
extent of infection (one plant infected per plot gives the same answer as all plants infected), 
and it can make it difficult to discriminate between treatments. If 10 (independent) plants 
per plot are individually assessed for presence of disease, then responses range from 0 
to 10, giving some information on the extent as well as presence of disease, as well as 
a more tractable analysis. This is an example where sub-sampling within experimental 
units provides valuable extra information and, in this type of situation, data should always 
be considered as total counts within each unit rather than individual binary observations 
(see remarks in Section 18.2.6). In scenarios where binary data are unavoidable, replicate as 
much as possible to counteract the lack of information per observation. 



18.2.8 Other Issues with Binomial Responses 

In this chapter, we have described one common implementation of a Binomial GLM; 
however, many variations are possible. For example, some statistical software prefers 
the Pearson rather than the deviance estimate of the dispersion parameter and provides 
Pearson rather than deviance residuals. Similarly, the dispersion parameter might be fixed 
at 1 by default rather than estimated, or might be estimated but not used within the model 
for testing and inference unless this is explicitly requested. 

The logit link is the canonical link for the Binomial distribution and widely used, par- 
ticularly in medical applications, because of its interpretation in terms of odds-ratios, 
although in practice this may be difficult to explain to non-mathematicians. Historically, 
a method called probit analysis was used for dose-response studies (Finney, 1971), and 
the simplest probit analysis model is equivalent to a Binomial GLM with probit link. The 
probit function is the inverse of the cumulative distribution function for the Normal dis- 
tribution, and can be interpreted in terms of a Normal tolerance distribution. Both the logit 
and probit functions are symmetric around probability p = 0.5 and usually give similar 
answers. Another option is the complementary log-log link function, logg(-log 5 ,(l - p)), 
which has asymmetric curvature. In all cases, the fit of a model and residuals should 
always be checked graphically, as this may reveal an inappropriate link function. 
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It is common to find that the response needs to be modelled in terms of fhe logarifhm 
of an explanatory variate, rafher fhan in ferms of fhe explanafory variafe direcfly. This is 
inferprefed by Colleff (2002, Secfion 4.1) in ferms of an asymmefric tolerance disfribufion, 
allowing a few individuals to have unusually high tolerances. 

One limifafion of regression wifhin a Binomial GLM is fhaf fhe success probabilify musf 
fend to zero as fhe explanafory variafe decreases fo and musf increase to one as fhe 
explanafory variafe increases fo -H=o. If fhis is nof fhe case, fhen a slighfly more complex 
non-linear model is required, furfher defails of fhese models are given in Colleff (2002, 
Chapter 4) or Finney (1971). 

One common use of logisfic or probif regression in a dose-response confexf is fhe esfi- 
mafion of fhe dose required fo achieve a cerfain response. For example, in pesficide sfudies 
fhe LD50, fhe dose required fo kill 50% of a sample, is often used fo compare compounds. 
This is differenf from a sfandard predicfion in fhaf we are frying fo predicf fhe value of fhe 
explanafory variafe af which a cerfain response is obfained, rafher fhan vice versa. This is 
an example of calibration (sometimes called inverse prediction), and was discussed for 
SLR in Secfion 12.9.3. Approximate SEs or CIs for fhis predicfion, somefimes called fiducial 
limifs, can be obfained from Fieller's fheorem (Colleff, 2002, Chapter 4). Note fhaf use of 
an LD50 fo compare compounds is sensible only if fhe responses can be fiffed by a paral- 
lel lines model on fhe linear predictor scale; ofherwise, a single value cannof capfure fhe 
overall differences befween fhe compounds. 

In Chapfer 6, we suggested a logif franstormafion fo deal wifh proporfion dafa where 
fhe numbers of frials nx, are reasonably large (> 20) and roughly equal across unifs, and 
fhe observed values are nof foo exfreme (nof foo many observed proporfions close fo 0 
or 1). This recommendafion is jusfified as a Normal disfribufion can provide a reason- 
able approximafion fo fhe Binomial disfribufion under fhese condifions. This approach is 
parficularly helpful when fhe experimenfal unifs are sfrucfured (e.g. a splif-plof design), 
as fhis sfrucfure cannof always be accounfed for easily in fhe CLM framework (as for 
regression, see Secfion 15.3). However, in all ofher cases, fhe use of fhe appropriate CLM 
is recommended. 

Finally, fhink abouf fhe infended sampling scheme when Binomial dafa are fo be col- 
lected. If is imporfanf fhaf frials are independenf, so fhere should be no compefifion for 
resources befween fhe individuals assessed. The number of frials and number of obser- 
vafions should also be considered. Increasing fhe number of frials per observafion also 
increases fhe precision of an individual observafion, so very small numbers of frials should 
be avoided whenever possible, buf increasing fhe number of observafions may have more 
effecf on fhe precision of fhe overall analysis. 



18.3 Analysis of Count Data: Poisson Responses 

A Poisson disfribufion arises as a counf of fhe number of fimes a phenomenon occurs 
wifhin a fixed interval of fime or space. Examples of counfs fhaf may be modelled as a 
Poisson disfribufion include 

• The number of bees arriving af a rape planf per minufe 

• The number of mufafions in a given lengfh of DNA after radiafion is applied 
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• The number of pine trees per unit area of mixed forest 

• The number of bacteria in a given volume of liquid 

If an observation is Poisson-distributed, then it can take only non-negative integer val- 
ues, 0, 1, 2 . . . + 00 . hi theory, there should be no upper boimd, but in practice, some physical 
upper bound can apply without invalidating the Poisson distribution assumption, so long 
as this limit is large enough in relation to the responses to avoid truncating the distri- 
bution. The Poisson distribution is defined by a single parameter, the mean p. We write 
that an observation y, is Poisson-distributed with expected value p, as y, ~ Poisson(p,). The 
probability of obtaining a specific value y, for the fth observation can be written in terms 
of its expected value as 



Prob(y; ; p,) = ^ for y, = 0, 1, 2 . . . +oo . 

3/i • 



The probability of observing a specific outcome, for example, Prob(y, = 0), depends only on 
the unknown parameter p,. If y, follows a Poisson distribution, then both its expected value 
and variance are equal to the parameter p„ i.e. 

E(y,) = b,; Var(y,) = p,. 

There is therefore a strong variance-mean relationship for this distribution. In the context 
of GLMs, we hypothesize that the expected value of the observation, p„ may depend on 
one or more explanatory variables. 



EXAMPLE 18.3A: PEA APHID SURVEY 

An ecological survey was done to investigate the co-occurrence of various insect preda- 
tor and prey species. Here, we consider a subset of the data relating to one aphid species, 
the pea aphid, Acyrthosiphon pisum. In each of three fields, 15 randomly chosen triplets of 
adjacent bean plants were inspected and the number of pea aphids present on the three 
plants was recorded. The data are in Table 18.10, and file aphids.dat contains explana- 
tory factor Field (three levels) to identify the observations by field, factor Sample to label 
the 15 samples within each field, and response variate AphidCount which holds the 
total count of aphids at each sample point. The objective here is to determine whether 
infestation differed among the three fields. 

The data are shown in Figure 18.5: they are discrete counts, and the variance between 
replicate observations for each field (si = 14.69, s| = 119.97, s| = 21.24) appears to 
increase with the mean count (i/i. = 4.6, 1 / 2 . = 15.4, 1/3. = 6.7), although the variances 
are clearly much larger than the sample means in each case. 



18.3.1 Understanding and Defining the Model 

We recommend reading Section 18.2 before proceeding further as the analysis of Poisson 
responses using a GLM follows the same framework as the analysis of Binomial responses. 
The major difference between the two cases is in the form and interpretation of the model. 
Again, models can be written in terms of quantitative or qualitative variables, or both, but 
here we introduce the Poisson model using a single qualitative variable (factor) and later 
consider other cases. We label the units by groups {j=l ... t) and label observations within 
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TABLE 18.10 

Counts of Pea Aphid from 15 Samples in Three Bean Fields (Example 18.3A 
and File aphids.dat) 



Field 


Sample 


Count 


Field 


Sample 


Count 


Field 


Sample 


Count 


1 


1 


0 


2 


1 


24 


3 


1 


3 


1 


2 


3 


2 


2 


10 


3 


2 


2 


1 


3 


5 


2 


3 


21 


3 


3 


10 


1 


4 


15 


2 


4 


28 


3 


4 


14 


1 


5 


7 


2 


5 


43 


3 


5 


4 


1 


6 


5 


2 


6 


11 


3 


6 


11 


1 


7 


2 


2 


7 


14 


3 


7 


6 


1 


8 


5 


2 


8 


22 


3 


8 


2 


1 


9 


4 


2 


9 


8 


3 


9 


3 


1 


10 


6 


2 


10 


7 


3 


10 


5 


1 


11 


1 


2 


11 


1 


3 


11 


3 


1 


12 


5 


2 


12 


20 


3 


12 


13 


1 


13 


1 


2 


13 


6 


3 


13 


5 


1 


14 


9 


2 


14 


10 


3 


14 


4 


1 


15 


1 


2 


15 


6 


3 


15 


15 



Source: Data from P. Wells, Rothamsted Research. 



groups (k=l ... nj), so that t/^^ is the response for the kth observation in the ;th group. The 
model with a single explanatory factor is then written as 

E(y,t) = lij with = g(p,) = Pi + y , 

where g() is the link function, as described in Section 18.1. Each replicate observation in 
the ;th group has a common expected value, namely, p,, and after transformation by the 




FIGURE 18.5 

Counts of pea aphid (•) from each of three bean fields (Example 18.3A) with predicted field counts (•) from the 
fitted model (Example 18.3B). 
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link function, this takes the value r|^. The model uses first-level-zero parameterization (see 
Sections 4.5, 11.2 or 15.2 for details). The parameter r|i represents the transformed value of 
the population mean for the first group, and represents the difference between the ;th 
and first group on the linear predictor scale, with constraint Vj = 0. 

The canonical link function for Poisson responses is the log link, and a model using this 
link function is called a log-linear model. The model can then be written as 

ri, = loge(P;) = rii + Vj , 

so the natural logarithm of the expected mean count changes according to the group it 
belongs to. We can rearrange this expression to write the model in terms of the expected 
counts as 



Pj = exp(ri,) = exp(rii + vf = exp(rii) x exp(v,) . 



On the natural scale, this is a multiplicative model (see also Section 6.4), and the fitted 
values can take non-negative values only. The Poisson GLM with log link can therefore 
be considered as an exponential model that accounts for the Poisson distribution of the 
responses and their associated heterogeneity. This exponential model is not completely 
general (see Section 17.3), as it is constrained to have a lower asymptote of zero. 

For counts held in response variate / with groups labelled by the explanatory factor 
Group, this Poisson GLM can be represented in symbolic form as 



Response variable: 
Probability distribution: 
Link function: 
Explanatory component: 



Poisson 

log 

[1] -I- Group 



As in Section 18.2, to fully specify the GLM, we need to give the probability distribution 
and link function in addition to the response variate and explanatory component of the 
model. 

Parameter estimation is achieved by maximum likelihood estimation, and again 
results will be obtained directly from statistical software rather than being derived here. 
For data with a Poisson distribution, the deviance for a model with fitted values Pj takes 
the form 



f «; 


y;tloge 


Vjk 


1 

1 

1 


j=l k = l 




A/ 

V. J 





Once parameter estimates have been derived, you should use the procedures described in 
Section 18.2 to check the model fit before drawing any conclusions. 

EXAMPLE 18.3B: PEA APHID SURVEY 

We want to fit a model to the pea aphid data of Example 18.3A to investigate whether 
the expected count of this aphid differs among the three fields. Using the explanatory 
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factor Field and the response variate AphidCount (see Example 18.3A), we can write the 
model in symbolic form as 



Response variable; 
Probability distribution: 
Link function: 
Explanatory component: 



AphidCount 

Poisson 

log 

[1] + Field 



In mathematical form, this model is written with first-level-zero parameterization as 



AphidCountji, ~ Poisson(|o,y) , t], = loge(P;) = r|i + Fieldj , 



where AphidCount is the count for the fcth observation in the ;th field (/ = 1, 2, 3) 
with expected value p,. Then, T|j is the log-transform of [Lj, and Fieldj is the difference 
on the log scale between the ;th and the first fields. The estimated parameters are 
fii = 1.526, Fieldi = 1.208 and Fields = 0.371, giving predicted values 

fii = 1.526, fi2 = 2.734, r\s = 1.897 . 



We can back-transform these values to estimate the expected number of aphids per 
sample in each field as 

Pi = exp(1.526) = 4.6, P 2 = exp(2.734) = 15.4, pj = exp(1.897) = 6.7 . 

These predictions are equal to the mean counts for each field (see Example 18.3A) and 
Eigure 18.5 shows these estimated field means with the observations. 



18.3.2 Analysis of the Model 

As described in detail in Section 18.2.2, the ANODEV table is formed by a partition of the 
total deviance into the change in deviance between the null model (overall mean) and the 
fitted model, and the change in deviance between the fitted model and the saturated model 
(where each observation is fitted exactly). If the residual deviance is larger than expected, 
then the fit of the model should be examined graphically to check for misspecification or 
outliers and addition of other explanatory variables should be considered. If these measures 
do not reduce the residual deviance to a value consistent with the expected chi-squared dis- 
tribution, then, as with Binomial data, a dispersion parameter can be added to the model, 
so that 



Var(y,) = tp p, . 

The presence of an estimated dispersion parameter changes the interpretation of entries 
in the ANODEV table in the same manner as for Binomial responses (see Section 18.2.2.2), 
requiring the use of deviance ratios and tests based on the E-distribution. In practice, 
over-dispersion is usually present for count data. Eor models with several explanatory 
terms, sequential ANODEV tables or marginal tests can be used to identify the predic- 
tive model. Assessment of individual parameters and prediction follows as described in 
Section 18.2.4. 
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EXAMPLE 18.3C: PEA APHID SURVEY 

Table 18.11 is the ANODEV for the model of Example 18.3B. Here, the residual deviance, 
ResDev, has value 191.475 with 42 df (P < 0.001 when compared with a chi-squared dis- 
tribution with 42 df). So there is strong evidence of over-dispersion for this model, 
which fits with our preliminary observation that the within-held variances were 
much larger than the within-field means (Example 18.2A). We might speculate that 
this over-dispersion arises from variation in prevalence (patchiness) between dif- 
ferent areas of each field and perhaps between plants. The deviance estimate of the 
dispersion parameter is equal to 4.559 and the model deviance ratio is calculated as 
F = 52.728/4.559 = 11.566. Compared with an F-distribution with 2 and 42 df, this devi- 
ance ratio is highly significant (P < 0.001). We therefore reject the null hypothesis and 
conclude that there are statistically significant differences in the mean count of pea 
aphids befween fields. 

Figure 18.6 shows the composite set of residual plots for these data based on standard- 
ized deviance residuals (Section 18.2.3). In these plots, the residuals appear somewhat 
skewed, but there is no strong evidence of variance heterogeneify and the Normal plots 
form approximately straight lines, so the model appears to give an adequate description 
of the data. 

Further discussion with the investigator revealed that samples were taken along 
transects rather than from random positions in each field. In this case, one might 
suspect dependence between samples, with samples closer together on a transect 
being more strongly correlated than those further apart. We investigate dependence 
(Section 5.2.2) using an index plot of the standardized residuals against transect posi- 
tion (sample number) for each of the three fields separately (Figure 18.7a), and by plot- 
ting each residual against the residual for the previous sample on the same transect 
(Figure 18.7b). There is no evidence in either graph of correlation between successive 
observations. 

We therefore accept the model and move on to interpretation. Our main interest is in 
quantifying differences between fields. Wifh a log link function, we can use the prop- 
erty that differences on the log scale back-transform to give ratios on the natural scale 
(see Section 18.2.5). On the log scale, the estimated difference between the second and 
first fields is 



112-111 = Fieldi = 1.208 , 



with SE = 0.2929 and P < 0.001, indicating significantly larger counts in field 2. Since 



fl 2 - % = loge(|l 2 ) - loge(|il) = loge(ji 2 /Al) r 



TABLE 18.11 

ANODEV Table for the Pea Aphid Survey with Explanatory Factor 
Field (Example 18.3C) 



Source of Variation 


df 


Deviance 


Mean 

Deviance 


P (Chi-Squared) 


Field 


2 


105.456 


52.728 


< 0.001 


Residual 


42 


191.475 


4.559 




Total 


44 


296.931 


6.748 
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FIGURE 18.6 

Composite set of residual plots based on standardized deviance residuals for the pea aphid survey (Example 18.3C). 



the back-transformation of this difference gives us ^. 2/^1 = exp(fi 2 - hi) = exp(1.208) = 
3 . 35 . The expected count in field 2 is therefore estimated to be 335% of the expected 
count in field 1. We can construct a 95% Cl for this quantity on the log scale as 

[(hj- rii) + t? 2 ®®’ X SE(fj 2 - hi)] = [1.208 + (2.108 x 0.2929)] = (0.617, 1.799) . 



When transformed back to the natural scale, the Cl is exp(0.617, 1.799) = (1.85, 6.05), indi- 
cating that the ratio of expected counts between the fields may be smaller than 2 or as 
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FIGURE 18.7 

(a) Index plot of standardized deviance residuals against transect position within each of three fields and (b) 
plot of residuals against previous residuals (within transects) for the pea aphid survey (Example 18.3C). • field 
1, ° field 2, • field 3. 
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large as 6. If we follow a similar procedure for comparing the third field with the first 
field, we find 



fi3 - fii = Field3 = 0.371 , 



with SE = 0.334 and no evidence of a difference in expecfed count between these two 
fields (P = 0.273). Back-transformation estimates the ratio as 1.45 with 95% Cl equal to 
(0.74, 2.84), confirming that a ratio of 1 is a plausible value. A similar calculation can be 
carried out for fields 2 and 3. 



18.3.3 Analysing Poisson Responses with Several Explanatory Variables 

In Example 18.3, we considered the case of Poisson responses with a single explanatory fac- 
tor. We can also use log-linear models for variates or a mixture of factors and variates. Below, 
we demonstrate the modelling process for two explanatory variables, a factor and a variate. 

EXAMPLE 18.4: CONIDIAL RELEASE EXPERIMENT 

An experiment was set up with the primary aim of measuring aphid infection rates 
in response to differing doses of fungus. Aphids in inoculation chambers were sub- 
jected to conidia showers from sporulating cadavers from one of two different sources 
(a clone or a standard source) for one of eight time periods ranging from 0 to 80 min. 
Estimates of the conidial doses received by the aphids were obtained as counts of 
spores on slides placed in the chambers. Here, we investigate the relationship between 
the achieved dose (variate Conidia) and infection time (variate Time) for the two types 
of source (factor Source). Each time period and source combination was tested in each 
of two experimental runs (factor Run). Separate sources were used for each replicate 
of each time period and the observed counts are listed in Table 18.12. 

The zero time period is a negative control: it should not be possible for any conidia to 
be released in no time, so this category just checks for contamination of slides, and the 
resulting zero counts verify that this was not present. We remove this category prior 
to analysis as it contains no information relating to the explanatory variable (see also 
discussion in Section 8.5). The data, excluding the zero time periods, can be found in file 

TABLE 18.12 



Number of Conidia Released by Different Sources 
over Eight Time Periods (Example 18.4 and Eile 
conidia.dat) 



Time 

(min) 




Source 




Standard 


Clone 


Runl 


Run 2 


Run 1 


Run 2 


0 


0 


0 


0 


0 


5 


6 


71 


8 


44 


10 


71 


223 


173 


209 


15 


157 


426 


165 


383 


20 


568 


1391 


584 


1188 


25 


883 


1098 


1296 


627 


40 


1436 


993 


400 


1628 


80 


3543 


4295 


4981 


4302 



Source: Data from J. Baverstock, Rothamsted Research. 
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C0NiDiA.DAT. The aim of this analysis is to establish whether there is any difference in 
release rates between the two sources and this can be interpreted as a regression with 
groups (see also Section 15.1). 

Preliminary investigation, plotting the log number of conidia against the time period, 
indicated a curved relationship in terms of time, but an approximate straight line rela- 
tionship with a log transformation of time; hence, we construct the explanatory variate 
logTime = log^iTime). The experiment is set up as a RCBD, with runs as blocks and all 
the experimental conditions evaluated once within each run. As in Example 18.2, we 
incorporate the structural component (factor Run) in the explanatory component of the 
model to obtain an intra-block analysis, and fit the Run factor before the explanatory 
terms (see Section 15.3). The initial model fits separate lines for each source. In addi- 
tion, as there are replicates for each treatment combination, we can formally investigate 
model misspecification with the lack-of-fit test described in Section 12.8, using a factor 
Period that has a separate level for each time period. The initial model can therefore be 
written in symbolic form as 



Response variable; 
Probability distribution: 
Link function: 
Explanatory component: 



Conidia 

Poisson 

log 

[1] + Run + logTime + Period -t Source 
-H logTime. Source + Period. Source 



The residual deviance of 2006.1 with 13 df for this model indicates substantial over- 
dispersion (F < 0.001). This cannot be explained in terms of outliers, misspecification or 
missing explanatory variables, and the residual plots (not shown) are adequate, so we 
use an estimated dispersion parameter, cp = 2006.1/13 = 154.3. We use marginal F-tests 
to identify the predictive model, respecting marginality, and the model selection pro- 
cess is shown in Table 18.13. 

We start with the full model (Model 1 in Table 18.13) and examine the lack-of-fit term. 
Period. Source, which tests for deviations from the separate straight lines for each 
source. This term is not statistically significant (F = 0.987) and so we drop it. As we 
have few residual df here (ResDE = 13), we choose to refit the model excluding term 
Period. Source before proceeding to Model 2 in Table 18.13. Dropping a term does not 
change the other incremental deviances or mean deviances, but the dropped term is 
merged with the residual and so the residual deviance, residual df and deviance ratios 
all change. Because the mean deviance of the Period. Source term was substantially less 
than 1 and the residual df were small, the residual mean deviance for the revised model 



TABLE 18.13 



Observed Significance Level (F) for Marginal F-Tests in a Sequence of 
Models for the Conidial Release Experiment with Explanatory Variate 
logTime and Explanatory Factors Run, Period and Source (Example 18.4) 



Term 




P 






Model 1 


Model 2 


Model 3 


Model 4 


Run 


— 


— 


— 


— 


logTime 


— 


— 


— 


— 


Period 


— 


0.038 


0.034 


0.028 


Source 


— 


— 


0.666 


sf- 


/ogT/me. Source 


— 


0.432 


* 


* 


Period. Source 


0.987 


* 


* 


* 



Note: — = term in model but not eligible for testing, * = term omitted from model. 
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is much reduced (equal to 116.4 with 18 df). We can then examine terms logTime.Source 
(separate lines, P = 0.432) and Period (lack of fit to common line, P - 0.038). At this stage, 
we drop term logTime.Source and refit to get a parallel lines model with lack of fit 
(Model 3 in Table 18.13). We can then test terms Source (separate intercepts, P = 0.666) 
and Period (lack of fit, P = 0.034). There is therefore no need for separate intercepts, so 
we drop term Source, leaving the SLR with lack of fit (Model 4), which cannot be simpli- 
fied further. This predictive model can be written in symbolic form as 

Explanatory component: [1] + Run + logTime + Period 

This fits a separate effect for each time period, and is equivalent to the simpler form 

Explanatory component: [1] + Run + Period 

We can write this model in mathematical form as 

loge(jl;P = r\.. = r|ji+ Rum + Period) , 

where p,) is the prediction of the expected value of counts in the jth time period for the 
ith run. To predict for an average run, we average over the runs to get 

1 

Ti., = pn + — > Rum + Period) . 

1=1 

To determine the extent and source of the lack of fit, we can compare these predic- 
tions with those obtained from a model excluding the lack-of-fit term, with explanatory 
component 

Explanatory component: [1] + Run + logTime 

This predictive model can be written in mathematical form in terms of continuous time 
(t)as 



r\i{t) = 6c H- Rum + ploge(t) , 

and again, we can average this model over runs to predict for a typical run as 

1 

r|(t) = a -H — > Rmw, + pioge(t) = a -t ploge(t) . 

1=1 

Eigure 18.8 shows these predictions from both versions of the model on the natural and 
linear predictor scales. 

There are two time periods, 20 and 40 min, where the counts appear inconsistent with 
the fitted line, either consistently larger (at 20 min) or smaller (at 40 min) than expected. 
Eurther investigation is required to determine whether this irregular behaviour is char- 
acteristic of the experimental system or an anomaly specific to this trial. In either case, 
since the fitted line broadly follows the observed trend, we can use it to indicate likely 
levels of conidial release to help design further experiments. We can transform the pre- 
dictive model back to the natural scale as 

|l(t) = exp(fi(t)) = exp(6c* + pioge(t)) = exp(6c*)exp(loge(t^)) = , 



where X = exp(6c*). Our predictive GLM is therefore equivalent to a power model which 
is constrained to pass through the origin, i.e. |t(0) = 0, while accounting for variance 
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FIGURE 18.8 

Observations (•) with predicted response from SLR ( — ) with 95% confidence intervals ( — ), and from model 
incorporating lack of fit (•) on (a) natural and (b) log scale for the conidial release experiment (Example 18.4). 

heterogeneity and the strong variance-mean relationship inherent to the Poisson dis- 
tribution. The slope coefficient (p) is estimated as 1.43 with 95% Cl equal to (1.20, 1.65), 
so the power relationship is greater than linear (P = 1) but less than quadratic (P = 2). 



18.3.4 Other Issues with Poisson Responses 

The form of the variance-mean relationship in the Poisson model is quite restrictive and 
is not appropriate for all count data. The Negative Binomial probability distribution pro- 
vides an extension that allows for some clustering in the responses by introducing another 
parameter into the model. This can be useful for zero-inflated Poisson responses, which 
occur when the responses resemble a Poisson distribution but with an unusually large 
number of zero counts. More sophisticated mixture models are also available in this con- 
text, and further details can be found in Ridout et al. (2001). 

A GLM using the Poisson distribution with the log link function deals with discrete counts 
where the variance increases with the expected value. In Chapter 6, we suggested the log- 
arithm transformation for data with this type of variance-mean relationship. If all of the 
expected counts are reasonably large (i.e. > 10), then a Normal approximation often provides 
a good approximation to the Poisson distribution. However, there is one important distinc- 
tion between the GLM and transformation approaches in this case. As we saw in Chapter 6, 
the transformation approach leads to group population means being estimated by the group 
geometric means, as the means are taken after the logarithm transformation. In the GLM, the 
logarithm transformation is made on the expected value, so that (in simple cases) the estimated 
count for each group is the arithmetic mean. This is a major advantage of the GLM approach 
over transformation. The only disadvantage of the GLM approach is that it can be difficult to 
account properly for complex structure in the experimental units, where this is present. 



18.4 Other Types of GLM and Extensions 

In this chapter, we have considered Binomial and Poisson responses as being those most 
commonly encountered in biological research. Here, we describe two other common types 
of response that can be analysed using GLMs. 
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An extension to the case of Binomial proportions occurs when trials have more than two 
outcomes, which are ordered (ordinal responses). For example, instead of a planf being 
classified as healfhy or infecfed, if mighf be classified as healfhy or wifh slighf, moderafe 
or severe infecfion, giving four ordered oufcomes insfead of fwo. Models fo deal wifh fhis 
sifuafion are offen called ordinal regression, and are relafed fo logisfic regression, buf are 
beyond fhe scope of fhis book. Furfher defails can be found in Agresfi (2010). 

Contingency tables summarize counts when each unit has been classified in ferms of 
several factors. This type of dafa offen arises from surveys. For example, a survey of farms 
mighf classify several weed species according fo fheir growfh habifs, winter hardiness, 
and abundance in differenf fypes of crop, and fhe number of fields in each habif x hardi- 
ness X abundance x crop cafegory forms a confingency fable. The aim of analysis would be 
fo esfablish any associafion befween fhe classifying facfors. For simple surveys, fhe fable 
may be classified by jusf fwo facfors, in which case fhe usual Pearson chi-squared fesf of 
associafion is appropriate (McClave and Sincich, 2012). For more complex surveys, a GLM 
can be used fo invesfigafe pafferns of associafion. In fhis case, fhe responses have a multi- 
nomial distribution, but after conditioning on marginal totals, it can be shown that this is 
equivalent to fitting a GLM with a Poisson distribution and log link. A thorough overview 
of fhe area is given by Agresfi (2007). 

Finally, we nofe fhaf fhe Normal disfribufion wifh fhe identity link function (i.e. no 
transformation) is a special case of a GLM. However, freafing fhis case as a GLM leads fo 
exacfly fhe same analysis as discussed in fhe previous chapfers, and so fo avoid pofenfial 
confusion we have nof elaborated fhe connecfions here. 

Since fhe GLM framework does nof allow specificafion of a sfrucfural componenf 
wifhin fhe model, we have used an infra-block analysis fo deal wifh blocking sfrucfure 
in Examples 18.2 and 18.4. Ofher forms of analysis fhaf explicifly accounf for a sfrucfural 
componenf, buf which are beyond fhe scope of fhis book, include generalized linear mixed 
models (GLMMs, see Sfroup, 2012), and hierarchical generalized linear models (HGLMs, 
see Lee ef al., 2006). 

EXERCISES 

18.1 A series of experimenfs invesfigafed fhe inferacfions befween a fungus fhaf 
infesfs aphids and broad bean planfs. Here, we consider dafa from a frial in 
which germinafion of fhe fungal conidia was assessed on adulf aphids. A bafch 
of 50 aphids was exposed fo fungal conidia fhen splif info groups of 10 aphids 
which were allocafed fo five planfs. Each planf was allocafed fo a sample fime: 

3, 6, 9, 12 or 24 h. Af each fime, 10 adulf aphids were sampled from fhe desig- 
nafed planf and examined under a microscope fo defermine fhe fofal number 
of conidia presenf and fhe number fhaf had germinafed. The numbers of ger- 
minafed and fofal conidia on aphids from each planf (variafes NGerm, Total) 
are given with the sample time (variate Time) in file germination.dat. Use a 
GLM wifh a Binomial disfribufion fo invesfigafe fhe paffern of germinafion 
over fime, remembering fo check for over-dispersion. Defermine fhe predicfive 
model and inferpref fhe resulfs.* 

18.2 Exercise 13.2 (file cabbage.dat) analysed fhe numbers of leaves on a sef of cab- 
bage planfs as a funcfion of days affer fransplanfafion. A log fransformafion 
was used fo deal wifh variance heferogeneify. Repeaf fhe analysis using a GLM 



Data from J. Baverstock, Rothamsted Research. 
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with a Poisson distribution for the untransformed response and a log link func- 
fion. Whaf is your esfimafe for fhe growfh rafe from fhis model? Is if compa- 
rable wifh fhaf from your model from Exercise 13.2? 

18.3* A pilof sfudy invesfigafed fhe period of leaf wefness required fo success- 
fully infecf leaves wifh a foliar disease. Trays of four young planfs wifh four 
leaves were sprayed wifh inoculum and fhen kepf wef for a period of 16, 
24, 48 or 72 h. The experimenf used a CE cabinef wifh four shelves and was 
designed as a RCBD, wifh shelves used as blocks. Eile wetness.dat holds fhe 
unif numbers (ID), sfrucfural factors (Shelf, Tray) wifh fhe wefness period 
(variafe Wetness) and number of leaves infecfed (variafe Nlnf, number ouf of 
16). Whaf disfribufion mighf you expecf fhe number of infecfed leaves fo fol- 
low? Use a suifable GLM fo model fhe number of infecfed leaves in each fray, 
faking accounf of fhe design sfrucfure by including shelves in fhe model. 
Check for evidence of over-dispersion, check residual plofs and carry ouf a 
formal fesf for lack of fif. Is fhere any evidence fhaf wefness period affecfs fhe 
number of infecfed leaves? Predicf fhe probabilify fhaf a leaf is infecfed affer 
36 h of wefness, and give confidence limifs for fhis predicfion. 

18.4 Example 12.2 analysed a sef of insecf counfs from a fransecf sample and we 
used a log fransformafion fo deal wifh variance heferogeneify. Repeaf fhe 
analysis (fhe dafa are in file transect.dat) using a suifable GLM and compare 
your resulfs wifh fhe original analysis. Which analysis do you fhink is more 
appropriafe? 

18.5 The ecological survey described in Example 18.3 took several samples from 
each field surveyed, using fhe same fransecfs and disfances, buf nof necessar- 
ily fhe same planfs in each sample. Eile aphids2.dat confains dafa for fhe pea 
aphid collecfed from fhe nexf sample affer fhe one analysed in Example 18.3. 
Repeaf fhis analysis for fhe new sample. Whaf conclusions do you draw? Can 
you exfend your analysis fo fake accounf of fhe previous sample? 

18.6 A greenhouse frial was underfaken fo evaluate 63 families of loblolly pine for 
resisfance fo pine rusf . The experimenf was a RCBD wifh five replicates (blocks), 
and several seedlings from each family were tested in each replicafe. Sefs of 
seedlings were grown in frays, and each fray held 8-31 seedlings (median 17). 
Eile RUST.DAT holds informafion on fhe design {ID, facfors Rep, DTray) and fam- 
ily allocafion (facfor Family) wifh fhe number of seedlings affected by rusf and 
fofal number in each fray (variafes Rust, NSeedling). Use a GLM fo esfimafe fhe 
probabilify of rusf occurring on a seedling for each family, affer accounfing for 
differences befween replicates. Is fhere any evidence of differences in resisfance 
among families? Idenfify fhe families where individual frees have less fhan 
20% probabilify of being affected by rusf.* 

18.7 In Exercise 17.3, you analysed a sef of field frials (dafa file cwtrials.dat) fo 
invesfigafe whefher fhe number of chickweed seeds produced by a planf could 
be relafed fo ifs biomass, using a log fransformafion on bofh fhe response and 
dry weighfs. Repeaf fhe analysis on fhe unfransformed number of seeds using 
a suifable GLM. Does fhis model accounf for fhe variance heferogeneify? 



Data from FBRC, University of Florida. 
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18.8 Data from an agronomic trial is available to assess the effect of fungicide and 
a biological control agent on the incidence of white rot on onions. The trial was 
designed as a RCBD with five blocks of 12 plots. The 12 treatments were all 
combinations of three varieties with presence or absence of the fungicide (two 
levels) and the biological control agent (BCA, two levels). File bca.dat holds 
the unit numbers (ID), structural factors (Rep, Plot), treatment factors (Variety, 
Fungicide, BCA) and the total number of plants per plot (variate Emerged) and 
number with symptoms of white rot (variate Disease). Use a suitable GLM to 
identify a predictive model for both emergence and disease incidence. Note 
that the number of emerged plants is a small proportion of the seeds sown 
(which was not counted but was constant across plots) so is small compared 
to the unknown upper limit. What treatment would you recommend to maxi- 
mize the number of unaffected plants for each variety?* * 

18.9 An investigation of response to insecticide used 28 cages of clones each pro- 
duced from a single aphid. There were 14 cages of each type of clone (S and 
R) and a target dose of active compound was applied to each cage, with the 
actual dose recorded. After a given period, the number of moving aphids in 
each cage was counted, and the clones were classified according to presence 
of a marker suspected to affect tolerance of the compound. File clone.dat 
contains unit numbers (ID), clone type (factor Clone), marker presence (factor 
Marker), and the logarithm of the dose applied (variate LogDose) with the 
number of moving aphids (variate Moving) and total aphids (variate Total) in 
each cage. Plot the data and comment on the structure of the groups (combi- 
nations of clones and marker types). Identify and write down a parsimonious 
predictive model to describe the data.+ 

18.10 A cage experiment was used to investigate the effect of three related insecticides 
on colonies of aphids with partial resistance to their common active compound. 
There were eight treatments: all combinations of the three insecticides or con- 
trol (no insecticide) with two types of colony (susceptible or partially resistant). 
The experiment was organized as a RCBD with six blocks of eight cages, and 
one treatment combination was allocated to each cage in each block. A colony 
of the designated type was reared in each cage, and the number of live aphids 
was counted before the insecticide treatment was applied and then 2 and 6 days 
after application. Both births and deaths could occur within each cage between 
assessments. File repeat.dat holds the structural factors {ID, Block, Cage), treat- 
ment factors (Insecticide, Clone) and responses (variates Pre, Day2, Day6). First, 
use a GLM to analyse the numbers before the insecticide treatment is applied. 
Should you take account of any differences in your analysis of the post-treat- 
ment numbers? Flow can you do this? Flow does this change the interpretation 
of the analysis?! 

18.11 The viability of carrot seed depends greatly on the conditions under which 
it is stored. Four batches of seed were stored in different conditions (labelled 
A-D). One hundred seeds were sampled from each batch: conditions A and 
B were sampled approximately every 60 days and conditions G and D were 



Data from J. Clarkson, University of Warwick. 

* Data from S. Foster, Rothamsted Research, 
t Data from Horticulture Research International. 
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sampled approximately every 30 days, and the number of non-viable seeds 
was evaluated. File carrot.dat contains unit numbers (ID), the structural fac- 
tors (Batch, Sample), explanatory variables (factor Condition, variate Days) and 
response (variate Count). Use a GLM to model the number of non-viable seeds 
over fime in each condifion and check fhe fif of fhe model carefully Is fhere any 
evidence of model misspecificafion? Idenfify any feafures of fhe dafa fhaf are 
incompafible wifh fhe GLM.* 



Data from D. Gray, Horticulture Research International. 
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In the preface, we identified the aim of this book as being to provide an introductory, practi- 
cal and illustrative guide to the design of experiments and subsequent data analysis in the 
biological and agricultural sciences. We have provided a brief overview of basic statisti- 
cal concepts and terminology in Chapter 1, and ideas of summary statistics, probability 
distributions and simple statistical estimates and tests in Chapter 2. The bulk of the rest 
of the book has introduced and developed various statistical approaches associated with 
designing experiments and analysing the data generated (Chapters 3 to 11) or with ana- 
lysing regression models (Chapters 12 to 15). We have tried to use common terminology 
across these sections to emphasize that the same form of model, the linear model, under- 
lies all of these situations. We then described some more advanced techniques. Chapter 16 
introduced linear mixed models that allow analysis of models with a structural component 
and any mixture of factors and variates in the explanatory component, with no require- 
ment for a balanced structure. Chapter 17 extended the regression modelling approach to 
allow curved responses and non-linear models, and Chapter 18 introduced generalized 
linear models (GLMs) that allow analysis of models with any mixture of factors or vari- 
ates in the explanatory component for data with certain types of non-Normal distribution. 
Throughout the book we have introduced real examples, either drawn from or inspired 
by our own experiences of working with scientists in research institutes and university 
departments. Our aim has been to show how the statistical approaches in this book can be 
used to address a range of real-life research problems across a number of application areas. 

We hope that you have reached this final chapter of this book having worked through 
each of the preceding chapters and attempted some of the exercises. You should now have 
sufficient understanding of the various statistical concepts to enable you to apply what 
you have learnt to your own research. In this final chapter, we attempt to draw the vari- 
ous strands of this book together by introducing case studies that illustrate how to use 
this accumulated knowledge to develop appropriate designs for different experimental 
scenarios, and by discussing how to apply sensible analysis approaches for individual 
scientific problems. 

We start with a summary of the various issues concerned with designing real studies 
(Section 19.1). We should consider the aims, hypotheses and treatments associated with a 
study separately from the available resources and constraints before allocating the treat- 
ments to the experimental material to construct an efficient design. During the design and 
planning stages, we also need to identify an analysis approach that enables us to address 
the aims of the study. In Section 19.2, we summarize and compare the various approaches 
introduced within the book, drawing out the similarities and differences between analysis 
methods for designed experiments and observational studies, and linking the analysis 
approach to the experimental aims. Finally, we discuss the information that needs to be 
presented when publishing the results of a study, including a description of the study 
design, data collection and analysis approaches, presentation of the results as provided by 
statistical software, and the interpretation of these results in the context of the scientific 
problem (Section 19.3). 
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19.1 Designing Real Studies 

The basic principles that we need to consider when designing any experimental or obser- 
vational study are always the same (as introduced in Chapter 3), but it is important to 
remember that almost every new study will be unique, in terms of either the questions to 
be asked or the resources that are available, or both. So, to develop an appropriate design 
for any new sfudy if is imporfanf fhaf we explore bofh of fhese componenfs (quesfions fo 
be asked, available resources) separafely before finding fhe besf way of combining fhem. 

19.1.1 Aims, Objectives and Choice of Explanatory Structure 

A sensible starting point is always to carefully consider the aims of the study (Section 
3.1). In broad terms, these aims may be associated with identifying important differences 
in the response between treatments (combinations of selected levels of explanatory vari- 
ables), understanding how the response from a biological system varies with changes in 
one or more explanatory variables, assessing how the response to one explanatory vari- 
able is affected by other explanatory variables, or simply in finding the combination of 
levels of explanatory variables that produces the best response. In most cases, it should 
be possible to re-express these aims and objectives in terms of testable hypotheses, which 
should then lead directly to the identification of the explanatory variables, the experimen- 
tal treatments and the explanatory component of the model to include in the design of the 
study. Sometimes a study forms part of a larger research project, possibly being one of a 
sequence of studies or experiments, where information collected from previous studies 
should inform the design of this new study, or where we are gathering information that 
will be used to inform later studies. However, similar aims at different stages of a substan- 
tial research project may need to be addressed in different ways. 

In Chapter 8, we discussed various ideas about extracting information from the explana- 
tory component of the model to answer specific scientific questions. Consideration of the 
best approach should be included at the design stage of a study. Where there are multiple 
possible input variables, it is important to decide whether the inclusion of a factorial struc- 
ture is useful; the possible benefits were described in Section 8.2.5. 

At early stages in a project, the primary interest may be to identify those explanatory 
variables that have a major impact on the response (sometimes calling screening), rather 
than to determine the exact impact of each explanatory variable. An effective approach to 
this problem would be to use each explanatory variable at just two levels (low and high) 
within a multi-factorial arrangement. In industrial experimentation, specialized design 
approaches (e.g. Plackett-Burman designs, Plackett and Burman, 1946; see also Mead et al., 
2012, Chapter 14) have been developed to provide a highly efficient approach for screening 
a large number of potential explanatory variables (although requiring the assumption that 
only the main effects of each explanatory variable are important). Once the most impor- 
tant explanatory variables have been identified, interest turns to the pattern of response 
across these key variables, using a factorial structure with those variables evaluated across 
a wider range of levels. Of course, this idea of screening explanatory variables using a 
small subset of their possible levels is really relevant only where the explanatory vari- 
ables lie on some quantitative scale - for truly qualitative explanatory variables the con- 
cepts of low and high levels are meaningless (Section 1.3). For these variables, the numbers 
of levels should be identified from the aims and objectives (scope) of the study. Where 
both qualitative and quantitative explanatory variables are to be included, a pilot study 
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with a few (combined) levels of fhe qualifafive explanatory variable(s) mighf be used fo 
screen for imporfanf quanfifafive explanatory variables. In any scenario, fhe convenience 
of fhis screening approach should be balanced againsf fhe possibilify of missing imporfanf 
inferacfions. 

For quanfifafive explanafory variables, if is necessary fo selecf fhe number and spac- 
ing of fhe values (levels) fo be used. This is easier in fhe confexf of designed experimenfs, 
where levels are direcfly under fhe confrol of fhe experimenfer, buf should also be consid- 
ered in observafional sfudies. Wifhouf prior knowledge fo suggesf ofherwise, if is difficulf 
fo argue againsf having equally spaced levels. In a simple regression modelling confexf, 
fhe response is usually assumed fo be linear buf a low-order polynomial (Secfion 17.1.2), or 
polynomial confrasf in fhe confexf of a mulfi-sfrafum ANOVA (Secfion 8.7), mighf also be 
appropriafe. The number of levels needs fo be sufficienf fo allow fhe fif of fhe selected poly- 
nomial model fo be assessed; fhis requires af leasf fhree levels for a sfraighf line model, 
four levels for a quadrafic model, and higher-order polynomial models require addifional 
levels. Replicafion can be used fo give a direcf fesf for lack of fif (Secfion 12.8). In fhe more 
general confexf of curved relafionships (Chapters 17 and 18), if is imporfanf fhaf levels 
cover fhe regions of greafesf inferesf, which are offen regions where fhe paffern of response 
changes mosf. If is always imporfanf fhaf fhe selected levels span fhe full range of values 
relevanf fo fhe aims and objecfives of fhe sfudy. Where several quanfifafive explanafory 
variables are used, fhe ideas of factorial sfrucfures sfill apply, so observafions should be 
selected fo span bofh fhe range of inferesf for each individual explanafory variable, and, 
ideally, fhe combined ranges for all explanafory variables. 

A final issue wifh regard fo fhe choice of explanafory variables in a sfudy is fhe need 
fo include some sorf of confrol or sfandard freafmenf (discussed in Secfion 8.5). Mosf fri- 
als use some sorf of confrol freafmenf wifh known properfies, fo give assurance fhaf fhe 
experimenf has run as expecfed. In Secfion 8.5, we idenfified fhree differenf fypes of con- 
frol: fhe posifive confrol, fhe negafive confrol and fhe sfandard. The inclusion of several 
confrols will usually be relevanf for sfudies concerned wifh fhe freafmenf of some defri- 
menfal acfivify, such as weed, disease or pesf confrol in agriculfural crops. The posifive 
confrol provides fhe besf possible response, and can provide a benchmark againsf which 
any new freafmenfs can be compared. For example, in insecficide frials, a posifive confrol 
mighf be some form of exclusion freafmenf fhaf ensures fhaf no pesfs infesf fhe crop. By 
confrasf, fhe negafive confrol provides fhe worsf-case scenario, and is offen useful only in 
checking fhaf some confrol of fhe defrimenfal acfivify is needed. For example, in insecfi- 
cide frials, a negafive confrol would be fhe lack of any chemical (or ofher) freafmenf, pro- 
viding evidence of a pesf infesfafion, and hence fhaf fhe insecficide freafmenfs are having 
a beneficial impacf in confrolling fhe pesf. Finally, fhe sfandard or reference freafmenf can 
provide a known response or fargef value, so for an insecficide frial fhis mighf be fhe besf 
commercially available producf (or fhe mosf commonly used commercial producf), and 
pofenfial new producfs musf perform af leasf as well as fhis sfandard freafmenf. 



19.1.2 Resources, Experimental Units and Constraints 

An important starting point when considering the resources to be used in a scientific 
study is to identify the experimental units. The choices made depend both on the aims of 
the study, and on the scientific methods used. For experimental studies, a useful defini- 
tion was provided by Cox (1961), who stated that 'an experimental unit corresponds to the 
smallest division of the experimental material such that any two units may receive dif- 
ferent treatments in the actual experiment'. In Section 3.1, we were slightly more precise. 
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defining the experimental unit for each explanatory variable separately, since explanatory 
variables may be applied at different levels of the experimental material. A similar defi- 
nition can be provided for observational studies, an experimental unit being defined, by 
analogy, as the smallest division of the biological material such that any two units may 
have different levels of the explanatory variable. Various examples of experimental units 
were listed in Section 3.1. It is also helpful to identify the measurement unit, the (bio- 
logical) material on which each measurement is made. The measurement units may be the 
same as the experimental units for one or more explanatory variables, but often differ, as 
discussed in Section 3.1. The measurement unit almost always corresponds to the lowest 
level of experimental material. Where different experimental units are used for different 
explanatory variables, the ideas of multi-stratum designs should be used, such as the split- 
plot design introduced in Section 9.2. 

Having identified the experimental and measurement units, the next step is to determine 
the maximum number of units that are available for the study. Often, this will be defined 
by the cost associated with using each unit (applying treatments, recording responses) 
and some constraint on the total funding available for the study. Other forms of constraint 
might include the amount of time taken to process each unit, or the physical space that is 
available within the experimental facility (e.g. glasshouse, controlled environment room 
or cabinet, incubator). It will sometimes be necessary to be able to complete the study 
within a certain period of time, or using some specific experimental facility. Where dif- 
ferent sizes of experimental unit are required, resulting in a multi-stratum design, these 
issues need to be considered for each stratum in turn, including choices about the relative 
numbers of units at each level of the structure. 

It is also important to identify any anticipated systematic sources of variability or struc- 
ture within the experimental material. This might be caused by the way in which the 
experimental units have to be managed (e.g. a constraint on the number of units that can 
be processed within a certain period of time, or that can be contained within some physi- 
cal space), or by the origin of the experimental material (e.g. plants raised from differ- 
ent batches of seed, or leaves on the same plant). Those units expected to have similar 
responses in the absence of any treatment should be grouped together into blocks, so that 
the systematic variation between these blocks can be separated from the background varia- 
tion, hence increasing the precision of treatment comparisons by reducing the unit-to-unit 
background variability. This blocking is incorporated into the structural component of the 
model. In many situations, there will be multiple potential sources of variation, and so we 
may want to account for all sources in constructing the design. In Section 3.3, we discussed 
the distinction between nested and crossed structures; the presence of crossed structures 
naturally leads towards some form of row-column design (see Section 9.1), while nested 
structures with experimental units at several different levels suggest variations on the 
split-plot design (Section 9.2). 



19.1.3 Matching the Treatments to the Resources 

Having identified the combination of explanatory variables (treatment structure) to be 
included and the resources to be used for the study, the final step in constructing the 
design is to combine these two components. An important part of this process is deter- 
mining the level of replication required to make likely the statistical detection of any treat- 
ment differences regarded as biologically important - i.e. to allow the demonstration of 
statistical significance for treatment differences large enough to be of biological interest. 
As discussed in Chapter 10, the amount of replication required depends on a number of 
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(possibly competing) elements, including the explanatory model, the magnitude of the dif- 
ference fo be defecfed and fhe variabilify associafed wifh fhe experimenfal unifs and/or 
measuremenf process. Where such informafion is available, a power analysis (Secfion 10.3) 
can be used fo calculafe fhe required replicafion. 

Assuming fhaf sufficienf resources are available, fhe consfrucfion of fhe design fhen 
jusf depends on mafching fhe freafmenf sfrucfure fo fhe resources, faking accounf of 
any blocking or ofher sfrucfure required. Where no blocking or ofher sfrucfural con- 
sfrainfs need fo be accounfed for, fhen a complefely randomized design can be used 
(CRD; Chapfer 4). In some circumsfances, where blocking is used for adminisfrafive 
convenience rafher fhan fo accounf for unavoidable heferogeneify, fhere is flexibilify fo 
choose fhe block size fo mafch fhe number of freafmenfs, resulfing in a randomized com- 
plefe block design (RCBD; Chapfer 7). More usually, sensible block sizes will nof nec- 
essarily mafch fhe desired number of freafmenfs, or several levels of sfrucfure may be 
presenf, and fhen some more complex design will be necessary. Some simple ideas were 
infroduced in Chapfers 9 and 11, buf a wide range of design approaches are possible, as 
described in Mead ef al. (2012). 

In cases where fhere is liffle or no informafion available abouf fhe various sources of 
background variafion, as af fhe sfarf of a new projecf, if may be sensible fo run a pilof 
sfudy fo gafher informafion before embarking on any major experimenfafion. Such sfud- 
ies can be used fo also provide some preliminary informafion abouf fhe key explanatory 
variables (e.g. fo idenfify fhe range of a quanfifafive explanafory variable fo be included). 
Buf unless fhese preliminary sfudies are done in a way fhaf makes fhem compafible wifh 
fhe main experimenfs, fhey may represenf a sub-opfimal use of resources. One way fo 
avoid fhis is fo use an adapfive or sequenfial design approach (see, e.g. Mead ef al., 2012, 
Chapfer 20), where each sfage of fhe experimenfal process provides informafion for fhe 
following sfages, and each sfage can be analysed separafely or as parf of fhe whole series. 

One way of assessing how effecfively fhe design of an experimenf uses fhe available 
resources is fo evaluate fhe division of fhe resources, as measured by fhe degrees of free- 
dom, befween and wifhin sfrafa where freafmenf comparisons are made. As discussed in 
Secfion 10.2, a reasonable 'rule of fhumb' is fhaf fhere should be befween 10 and 20 residual 
degrees of freedom in each sfrafum of fhe design where freafmenf comparisons are made; 
fhis ensures a reasonable esfimafe of background variabilify is obfained. Having foo few 
(< 10) residual degrees of freedom in a sfrafum may resulf in low power for defecfing freaf- 
menf differences in fhaf sfrafum; so increasing fhe replicafion af fhaf level of fhe design 
mighf be sensible. Having foo many (> 20) residual degrees of freedom in a sfrafum gives 
no real advanfage, and may imply fhaf replicafion (in fhaf sfrafum) can be reduced or fhaf 
fhe opporfunify fo answer addifional quesfions, fhrough fhe inclusion of furfher freaf- 
menf factors, should be faken. In balanced designs wifh a nested sfrucfure, such as fhe 
splif-plof design (Secfion 9.2), fhere are a fixed number of smaller unifs nested wifhin each 
larger unif, and so changing fhe replicafion of larger unifs also changes fhe fofal number 
of smaller unifs. In fhese designs, fhe residual df are larger (somefimes much larger) for 
fhe lower sfrafa, and fhis imbalance of informafion is one of fhe disadvanfages of nesfed 
designs. 



19.1.4 Designs for Series of Studies and for Studies with Multiple Phases 

In many cases, an individual study forms part of a larger series. This is almost always the 
case for field trials, where results for a single trial in a single year can be notoriously unrep- 
resentative. For example, official guidance for studies concerned with the development 
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and testing of new crop protection products (e.g. European and Mediterranean Plant 
Protection Organization; http://www.eppo.int/) indicates that experiments should be 
repeated across multiple sites, at which pest presence or intensity or timing might vary, as 
well as in multiple years, in which environmental conditions might impact on the effect of 
differenf freafmenfs. Ideally, similar designs will be possible for each sife x year combina- 
fion, buf somefimes fhere will be differenf consfrainfs for each frial, requiring differenf 
designs. Similar issues occur in crop breeding programmes, as new variefies are required 
fo perform well across a wide range of environmenfs. In fhis confexf, limifed seed in early 
generafion frials may produce consfrainfs, so fhaf nof all pofenfial lines can be frialled 
in all environmenfs wifhin fhe same year, and new lines will be infroduced in following 
years wifh less promising lines dropped from fhe programme. 

For mosf series of sfudies, if is imporfanf fhaf dafa from each separafe sfudy (e.g. af a 
single sife in a single year) can be analysed on ifs own, so fhaf fhe individual characferisfics 
of each sfudy can be defermined before combining fhe dafa. The simplesf design for a sef of 
sfudies would use fhe same sef of freafmenfs and same design for each sfudy. In fhis case, if 
fhe background variafion is similar across fhe sfudies, fhen a combined analysis is sfraighf- 
forward wifhin fhe ANOVA framework, incorporafing sfudy as a high-level sfrucfural 
componenf wifhin which fhe common design is nesfed, and allowing for an inferacfion 
befween fhe wifhin-sfudy explanafory componenf and sfudy (i.e. fhe possibilify of differenf 
freafmenf effecfs in fhe differenf sfudies). If fhe background variafion differs, or if fhe sef of 
freafmenfs is common buf a differenf design is used for each sfudy, fhen a combined analy- 
sis is possible buf musf accounf for fhe individual sfudy designs and allow for differenf 
levels of background variafion; fhis can be achieved using linear mixed models (Chapfer 
16). Where a differenf sef of freafmenfs is used in each sfudy, fhen some overlap - a subsef of 
common freafmenfs - musf be presenf if fhe frials are fo be analysed fogefher. A combined 
analysis in fhese circumsfances relies heavily on fhe assumpfion fhaf no sfudy x treafmenf 
inferacfion exisfs, parficularly for fhe comparison of freafmenfs nof fesfed wifhin fhe same 
frial. If several common freafmenfs are presenf, fhen fhis assumpfion can be fesfed (fo a 
limifed exfenf) wifhin fhaf sef of common freafmenfs, buf fhis assumpfion cannof be fesfed 
af all where only one common freafmenf is used. Where a sef of sfudies cannof all use fhe 
same freafmenfs, we fherefore sfrongly advise fhe use of a large overlap, wifh fhe common 
sef preferably including freafmenfs fhaf span fhe full range of responses. Again, analysis of 
fhe combined sef of sfudies usually requires fhe use of linear mixed models. 

Careful fhoughf abouf design is also needed in fhe increasingly common confexf of fwo- 
or mulfi-phase sfudies. These of fen occur where a crop is grown in fhe field and fhen har- 
vesfed and processed in fhe laboratory or fo produce some food producf. Treafmenfs may 
be applied in bofh fhe field and laboratory phases, and fhe design musf accounf for sfruc- 
fure in bofh fhe field and fhe laboratory, as well as ensuring fhaf suifable harvesf samples 
can be obfained for fhe lafer processing phase. For example, a sfudy fo examine facfors 
influencing bread-making qualify of wheaf mighf use a RCBD wifh several variefies and 
ferfilizer regimes in fhe field, fhen splif fhe harvesf from each plof info four sub-samples, 
each fo be used for a differenf variafion in fhe bread-making process. The four samples 
from each plof are processed fogefher, wifh each plof processed on a separafe day. The 
experimenfal sfrucfure musf accounf for blocking in fhe field and processing fime in fhe 
laboratory, as well as all freafmenf effecfs. Such sfudies mighf fhen involve furfher phases, 
for example, fhe fasfe fesfing of fhe resulfing bread by several people, each giving a subjec- 
five score. In designing mulfi-phase sfudies, if is imporfanf fo be aware of all consfrainfs 
during each phase of fhe sfudy, fo ensure fhaf fhe effecfs of each freafmenf (and inferac- 
fions) can be exfracfed in fhe analysis, and fo idenfify fhe sfrucfure wifhin each phase. If 
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can be useful to confound structure between phases, for example, using blocks in the field 
as blocks in the laboratory, and it may be necessary to allow separate analyses to be made 
at the end of each phase. A useful approach and overview has been developed by Brien 
and Bailey (2006). 

Multi-phase trials are now common in the use of high-throughput technologies devel- 
oped in the 'omics revolution, where the impact of different treatments on the levels of 
gene, protein or metabolome expression is measured. Typically, these studies involve an 
experimental phase (in the field or, more usually, controlled environment) during which 
treatments are applied, and plant material from each experimental unit is then harvested 
and processed to produce one or more samples used for the 'omics phase. In this con- 
text, there are often severe cost constraints on the total number of samples that can be 
used, and many technologies can process only small numbers of samples simultaneously. 
Given that substantial costs may be involved in obtaining expression readings for a single 
sample, it is important to ensure that the experimental phase is well designed, taking 
account of any constraints in the 'omics phase. Common issues at the 'omics phase include 
the allocation of experimental treatments to small blocks, the balance between different 
types of replication, and the assessment of response along a time course. Case Study 19.2 
discusses the allocation of experimental treatments to small blocks in the context of two- 
channel microarray gene expression studies. Most 'omics studies include both biological 
replication (samples from different biological organisms) and technical replication (several 
sub-samples prepared from each unit in the experimental phase to allow for variation dur- 
ing the sample processing and measurement phases). Technical replication is particularly 
important if variability in the sample processing or measurement stages is large, whereas 
biological replication is important to ensure that results are not specific to one organism 
or sample. For time course studies, involving samples collected over time, it is important 
to identify whether the data form a cross-sectional study (samples collected from differ- 
ent organisms at each time point), or a longitudinal study (samples collected repeatedly 
from the same organisms over time). A longitudinal study may provide more precise com- 
parisons across time points, but the analysis must account for correlation between samples 
taken from the same organism. 



19.1.5 Design Case Studies 

As previously noted, each study is unique, and so it is impossible to provide a generic 
recipe for how to design any study. However, to illustrate some of the issues identified 
above, we present three case studies from our own experiences. 

Case Study 19.1: Designing a Large-Scale, Multi-Site Field Experiment 

Spring 2000 marked the beginning of an extensive ecological experiment, known as 
the Farm Scale Evaluations (FSEs). The aim of the study was to compare the effects of 
two treatments on various indicators of farmland biodiversity, including both plants 
and invertebrates. The two treatments represented the composite effects of manage- 
ment practices associated with genetically modified herbicide-tolerant (GMHT) and 
conventional crop varieties. Simultaneous experiments were to be carried out for beet, 
maize, and spring and winter oilseed rape crops. The null hypothesis for each crop 
was that there was no difference between the two treatments in abundance and diver- 
sity of the chosen indicators. The two-tailed alternative hypothesis was that the treat- 
ments differed, in either a positive or negative direction. 
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The FSEs, carried out by a consortium of UK research institutes, were to form fhe 
largesf, and mosf highly scrufinized, ecological sfudy of ifs kind fo dafe, cosfing in fhe 
region of £6 million (Clark ef al., 2006). Underfaking an experiment of such magnifude 
required much planning in pracfical, biological and sfafisfical ferms. The projecf began 
in 1999 wifh a pilof sfudy fo develop sampling profocols and fo inform a sfafisfical 
sfudy fo defermine an appropriafe design for fhe experimenf. This sfafisfical sfudy 
examined choice of design sfrucfures, fhe power of any pofential design fo defecf treaf- 
menf differences of a given size, and choice of subsequenf analysis approaches for fhe 
dafa collecfed. Full defails can be found in Perry ef al. (2003) and Rofhery ef al. (2003); 
here we focus on four specific design issues: choice of experimenfal unif, allocafion of 
treafmenfs fo unifs, esfimafion of sample size and sampling sfrafegy. 

As if was imporfanf fhaf fhe resulfs of fhe FSEs were represenfafive of commercial 
Brifish agriculfure as a whole (e.g. farm locafion, farming infensify, weafher condi- 
fions, soil fypes), farms fhroughouf Brifain, especially in fhose areas where fhe chosen 
crops were fypically grown, were fo be selecfed fo fake parf in fhe sfudy. The most 
pertinent design issue was then the definition of fhe experimenfal unif. Two choices 
were considered (Figure 19.1): half fields wifhin whole fields (i.e. a RCBD wifh fields as 
blocks and half-fields as experimenfal unifs), or whole fields wifhin farms (i.e. a RCBD 
wifh farms as blocks and fields as experimenfal unifs). There were many biological 
considerafions (e.g. mobilify of insecfs and fheir behaviour af differenf spatial scales) 
but these had to be balanced with statistical considerations. The primary statistical 
argument for fhe half-field opfion was fhe pofential reduction in residual variation 
that might be achieved due to two half-fields being more similar fo each ofher (e.g. in 
soil type, surrounding habitat, previous management) than two paired whole fields. 
Limifed dafa from previous sfudies, coupled wifh fhe small amounf of dafa from fhe 
pilof sfudy, suggesfed fhaf half-fields were indeed likely fo be less variable fhan paired 
whole fields. Ofher more pracfical issues, such as fhe availabilify of whole fields and 
ease of sampling, were also confribufing factors. The final decision was made fo use 
half-fields as fhe experimenfal unif. Only one field per farm was used in each year, but 
some farms were sampled in 2 or 3 years, wifh a differenf field used in each year. 

The boundary line used fo splif any field info fwo was first determined by assessing 
the many factors that might influence fhe variabilify of wildlife wifhin fhe field; fhe 
optimal choice being the line that divided the field info fwo halves as close fo iden- 
fical as possible wifh respecf fo fhese non-freafmenf influences on biodiversify. An 
example of fhe defailed plans drawn up fo inform fhis decision for each field is shown 

(a) Farm 1 (b) Farm 1 





FIGURE 19.1 

Schematic representation at one farm of two possible choices of experimental unit for the FSE study (Case Study 
19.1). Both choices correspond to a RCBD, but in (a) treatments are applied at random to half-fields (units) within 
a field (block), and in (b) treatments are applied at random to whole fields (units) within the farm (block). 
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in Figure 19.2. In this case, the main criterion for the location of fhe boundary is fhe 
presence of a field of beef fo fhe soufh-easf (righf of plan) and fhe presence of nursery 
buildings fo fhe norfh-wesf (leff of plan) of fhe field. A splif running from norfh-easf 
fo soufh-wesf (verfically on plan) would pofenfially confound fhe freafmenfs wifh fhis 
environmenfal difference. The chosen splif running norfh-wesf fo soufh-easf (hori- 
zonfally on plan) ensures fhaf each half-field has boundaries including fhe beef crop 




FIGURE 19.2 

Detailed field plan showing characteristics of the field to be sampled and other features in the immediate local- 
ity that might influence the variability of wildlife within the field, the final choice of boundary for splitting into 
two half-field units, the final allocation of treatments to half-fields, and locations of within-half-field sampling 
transects T1-T12 (Case Study 19.1). (Courtesy of Matthew Skellern, Rothamsted Research.) 
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and nurseries, as well as hedgerows, gardens and grass paddocks. The protocol then 
labelled the most northerly half-field (or wesferly, depending on fhe overall orienfa- 
fion of fhe field) as 'A' and fhe ofher half as 'B'. An envelope confaining a predefer- 
mined randomizafion of fhe fwo freafmenfs fo halves A and B was fhen opened fo 
give fhe final allocafion of freafmenfs. For example, in Figure 19.2, half A (wesf) was 
sown wifh fhe GMFIT variefy and half B (easf) wifh fhe convenfional variefy. This 
fwo-sfage protocol ensured fhaf fhe allocafion of freafmenfs was nof influenced by 
any of fhe parfies involved in fhe experimenf, and fhaf fhe final resulfs would have 
sfafisfical validify. 

Nexf, fhe sample size (number of whole fields) had fo be defermined. As nofed above, 
very liffle exisfing dafa were available fo inform any power analysis. So, instead, a 
simulafion sfudy was done fo complemenf fhe analysis of dafa collecfed from fhe pilof 
sfudy. The full defails are given in Rofhery ef al. (2003). Briefly, counf dafa were simu- 
lafed according fo fhe Negafive Binomial disfribufion (Secfion 18.3.4) for a range of 
scenarios which included differenf overall mean counfs (1, 5, 10 and 50), field effecfs 
covering a 100-fold span in variafion, sizes of mulfiplicafive freafmenf effecfs (1.3-fold, 
1.5-fold and 2-fold), levels of variabilify (%CV = 50%, 80%, 100%) and values of fhe 
Negafive Binomial exponenf parameter (allowing fhe background variance fo be pro- 
porfional fo fhe expecfed value or proporfional fo fhe square of fhe expected value). 
Power was esfimafed using randomizafion fesfs (e.g. Secfion 5.2.4) and 500 sefs of simu- 
lafed dafa for each of five sample sizes (n = 20, 30, 40, 60 and 90 fields) for each scenario. 
The resulfs of fhe power sfudy were complex buf fhe final recommendafion was fo aim 
fo achieve 60 fields per crop (equivalenf fo 20 per year over fhe 3-year period of fhe 
experimenf). The power of fhis scheme for defecfing 1.5-fold freafmenf differences and 
achieving a 50% CV was esfimafed fo exceed 80% for many of fhe scenarios sfudied. 

Finally, fhe field-sampling profocols involved faking measuremenfs from up fo 12 
fransecfs per half-field, each exfending from fhe field edge in fowards fhe cenfre of fhe 
field and spaced as evenly as possible around fhe fhree non-freafmenf -boundary field 
edges (see Figure 19.2). On each fransecf, fhere were five pofenfial sample poinfs af 
disfances of 2, 4, 8, 16 and 32 m info fhe field. Up fo 60 pseudo-replicafe observations 
were fherefore made per half-field, and fhe sub-samples were pooled fo give half-field 
fofals for analysis. Neverfheless, fhe wifhin-half-field sub-sampling gave useful infor- 
mation fo later assess and compare variabilify in fhe responses af various spafial scales 
(see Clark ef al., 2007). 

In Example 16.2, we presented an analysis of one dafa sef collecfed during fhe FSEs. 
There we wrote fhe model for fhe dafa using symbolic nofafion as 

Sfrucfural componenf: Farm/Field/DFIalf 

Explanatory componenf: [1] -i- Year*Treatment 

In fhaf example, fhe factor Field labelled fields wifhin farms (1-3), and fhe resulfing mulfi- 
sfrafum ANOVA fable (Table 16.8) confained fhree sfrafa relafing fo farms, whole fields 
wifhin farms and half-fields wifhin whole fields wifhin farms. The firsf publicafions of 
ESE resulfs (e.g. Brooks ef al., 2003; Haughfon ef al., 2003; Hawes ef al., 2003; Heard ef al., 
2003; Roy ef al., 2003; Bohan ef al., 2005) focussed on fhe freafmenf effecfs, which are esfi- 
mafed wifhin fields. Effecfs in higher sfrafa were of less inferesf and so fhe farm and field 
effecfs were combined by specifying fhe sfrucfural componenf of fhe model as 



Sfrucfural componenf: Farm.Field/DHalf 
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The Year main effect was also excluded from the explanatory model. As a result of these 
two changes, the two upper strata in Example 16.2 were combined into a single stratum 
subsuming the farm, whole field and year main effects. In addition, preliminary analy- 
ses showed no evidence of treatment x year interactions (as in Example 16.2), and hence 
this term was also excluded from the model, with the explanatory structure specified as 

Explanatory component: [1] + Treatment 



Case Study 19.2: Multi-Phase Experiments - Gene Expression 
Microarrays for Plant Response to Pathogens 

One important application of multi-phase experiments is the study of gene expression 
responses in plants grown in different environmental conditions, or exposed to different 
stress treatments. Here, we discuss a microarray study concerned with the measurement 
of gene expression responses in Arabidopsis thaliana, the model plant, over a 24-h time 
period post-infection with the pathogen Botrytis cinerea. Initial interest was in assessing 
the impact of infection by two contrasting isolates of the pathogen, with a third 'mock 
inoculation' treatment (inoculation with water) included to provide a baseline response. 

The first phase of the study was the production of plant material to be inoculated, 
from which samples of genetic material (RNA, cDNA) could be obtained for process- 
ing before application to the microarrays. The pathogen isolates and mock inoculation 
were to be applied to detached leaves, with the whole leaf then being processed to 
generate the genetic sample. Hence, separate leaves were needed at each time point for 
each inoculation treatment. Different plants would be used for replicates of the treat- 
ments, but there were several options for use of plants within replicates. Three options 
were considered, as illustrated in Eigure 19.3: 

a. Use a separate plant for each inoculation treatment with leaves within plants 
allocated to the time points (i.e. plants are the experimental units for inocula- 
tion and leaves within plants are the experimental units for time) 

b. Use a separate plant for each time point, with leaves within plants allocated to 
the different inoculation treatments (i.e. plants are the experimental units for 
time and leaves within plants are the experimental units for inoculation) 

c. Use a separate plant for each inoculation x time point combination, but use 
a specific leaf (e.g. the seventh true leaf) within each plant (i.e. the plant.leaf 
combinations are the experimental units for both inoculation and time) 



(a) Plant 



(b) Plant (c) 
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FIGURE 19.3 

Three options for selecting leaves from plants to be treated with three different inoculation treatments and 
incubated for two different time periods, using separate plants for each (a) inoculation treatment, (b) time point 
and (c) inoculation x time point combination (Case Study 19.2). Highlighted boxes represent individual leaves 
to be sampled; numbers in boxes indicate leaf numbers within plants. 
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Option (a) would use fewest plants, and within-plant comparisons would avoid 
genetic variation between plants, therefore pofenfially providing for a more precise 
comparison of responses befween fime poinfs. Opfion (b) would use more planfs 
(assuming more fhan fhree fime poinfs are used), buf would pofenfially provide a more 
precise comparison of responses befween inoculafion freafmenfs wifhin a fime poinf. 
However, previous sfudies had idenfified subsfanfial variafion in gene expression 
befween leaves of dif ferenf ages, and fhis variafion was offen greafer fhan fhaf befween 
planfs. Hence opfion (c), which inoculafes leaves of fhe same age, was preferred. 

The fofal number of planfs required fhen depended on bofh fhe number of replicafes 
required for each inoculafion x fime combinafion and fhe number of fime poinfs. The 
researchers expecfed fhaf fhere would be subfle changes in gene expression over fhe 
firsf 12 h, fhough some genes were nof expecfed fo show any response unfil 18-20 h 
posf-infecfion. Wifh a wide range of pofenfial shapes of expression profiles over fime, if 
was considered besf fo have fhe sampling fimes equally spaced - possibilifies included 
sampling every hour (25 fime poinfs, sfarfing immediafely affer inoculafion), every 2 h 
(13 fime poinfs), every 3 h (9 fime poinfs) or every 4 h (7 fime poinfs). While replicafe 
samples (fechnical replicafes) would be generafed during fhe posf-harvesf process- 
ing, so fhaf fhe gene expression for each planf sample would be measured on mul- 
fiple microarrays, if was also imporfanf fo be able fo compare fhe variafion in gene 
expression due fo fhe differenf freafmenf (inoculafion x fime) combinafions wifh fhe 
befween-planf (biological) variafion. Therefore, if was considered necessary fo also 
include replicafe planfs for each freafmenf. 

The planfs were fo be grown in confrolled environmenf cabinefs (fo minimize varia- 
fion due fo fhe growing environmenf), wifh fwo separafe cabinefs available. Each cabi- 
nef had fwo shelves wifh space for 48 planfs fo be grown on each shelf in an array of 
four rows of 12 planfs (Figure 19.4), giving an upper limif of 192 planfs. 

Biological replicafes could be processed separafely buf, wifhin each replicafe, leaves 
for all freafmenfs musf be harvesfed af fhe same fime, inoculafed, and fhen sampled 
af differenf fimes affer inoculafion. Sampling every hour would require 75 planfs 
per biological replicafe (3 inoculafions x 25 fime poinfs), so fhaf only fwo replicafes 
would be possible (2 x 75 = 150 planfs), while sampling every 4 h would allow up fo 
nine replicafes (3 inoculafions x 7 fime poinfs = 21 freafmenfs, 9 replicafes x 21 freaf- 
menfs = 189 planfs). The choice here is befween improved precision for comparison of 
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FIGURE 19.4 

Arrangement of plants on two shelves within one CE cabinet, with random allocation of sampling times (T1 . . . 
T13) to sets of three adjacent plants within rows, and of three inoculation treatments (M = Mock, II = Isolate 1, 
12 = Isolate 2) to plants within each set (Case Study 19.2). Dashes indicate unused positions. 
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treatment differences af given fime poinfs, and beffer informafion abouf fhe paffern 
of changes in gene expression over fime. The compromise was fo sample every 2 h 
(13 fime poinfs), which allows a reasonable number of fime poinfs over which fo mea- 
sure fhe responses of lafe-expressing genes. This scheme resulfs in 39 freafmenfs and 
each replicafe could comforfably fif on a single shelf, allowing four complefe biologi- 
cal replicafes. Any sysfemafic differences befween biological replicafes infroduced af 
lafer sfages could fhen be confounded wifh differences befween cabinefs and shelves. 
There was pofenfial variafion bofh along and befween rows wifhin each shelf. If was 
fherefore decided fo randomly allocafe sefs of fhree adjacenf planfs in a row fo a par- 
ficular fime poinf, wifh fhe fhree inoculafion freafmenfs randomly allocafed (buf nof 
yef applied) fo planfs wifhin each of fhese sefs. The arrangemenf for one cabinef is 
shown in Figure 19.4, and a new randomizafion was used for fhe second cabinef. If 
dafa were measured af fhis poinf of fhe experiment fhe sfrucfural componenf of fhe 
model would fake fhe form 

Sfrucfural componenf: Cabinet/Shelf/Set/Plant 

The four biological replicafes were harvesfed on four separafe days. For each planf, 
fhe sevenfh frue leaf was excised and fhe allocafed inoculafion freafmenf applied. Af 
each 2-hourly fime poinf, fhe appropriafe sef of fhree leaves (i.e. fhe sef previously 
allocafed fo fhaf fime poinf) was freeze-dried fo sfop any furfher developmenf prior 
fo processing fo obfain fhe genefic maferial. Throughouf fhe subsequenf processing 
sfeps (amplificafion, labelling), fhe 39 samples in each biological replicafe were pro- 
cessed fogefher where possible. Where fhis was nof possible, samples were processed 
in bafches comprising eifher fhe 13 samples for a parficular inoculafion freafmenf 
or fhe fhree samples for a parficular posf-infecfion sampling fime, wifh fhe order in 
which fhe bafches and samples wifhin bafches were processed being randomized. As 
a fwo-channel microarray sysfem was fo be used fo assess fhe gene expression for each 
sample, fhe labelling phase required fhe division of each sample info fwo sub-samples, 
one fo be labelled wifh each dye. This sfep aufomafically infroduces some processing/ 
measuremenf replicafion, wifh fhe pofenfial for furfher such fechnical replicafion fo be 
infroduced during fhe microarray phase. 

The final phase of fhe sfudy involved fhe allocafion of samples fo microarrays fo mea- 
sure relafive gene expression levels. Two separafe samples can be compared direcfly 
on each array (essenfially each array is a block of size fwo), wifh consisfenf differences 
in responses befween fhe fwo dye labels also expecfed. The labelling of each sample 
wifh bofh dyes had already safisfied fhe 'dye-balance' principle, wifh each freafmenf 
measured using bofh dyes. There were 312 samples (fhree inoculafion freafmenfs x 13 
fime poinfs x four biological replicafes x two sub-samples) to be allocated to arrays, 
requiring a minimum of 156 fwo-channel arrays (where fhe fwo channels relafe fo fhe 
fwo wavelengfhs used fo read fhe expression response for fhe differenf dyes). Clearly, 
if would nof be possible fo direcfly compare all pairs of freafmenfs using a reasonable 
number of arrays and so fhe sf rafegy musf be fo direcfly make fhe comparisons of mosf 
inferesf. The mosf imporfanf comparisons are befween adjacenf fime poinfs wifhin 
each inoculafion freafmenf and befween fhe differenf inoculafion freafmenfs af each 
fime poinf. If is also of inferesf fo allow direcf comparison across fhe differenf biologi- 
cal replicafes so fhaf inferacfions can be invesfigafed. To allow all of fhese comparisons 
fo be made, each sub-sample was used on fwo arrays, resulfing in a fofal of 312 arrays, 
wifh each planf sample being measured on four separafe arrays, fwo of fhese fechnical 
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replicates being labelled with each dye. The allocation of treatments to microarrays 
was split into two parts, each using 156 arrays, with each part focussing on a differenf 
sef of comparisons. 

The firsf parf of fhe design focussed on fhe comparison of samples along fhe fime 
course using a 'loop design' (Kerr and Churchill, 2001; Wif ef al., 2005). Wifhin each 
inoculafion freafmenf and biological replicafe, each fime poinf appeared on an array 
wifh samples from fhe previous and nexf fime poinfs, and fhe fwo occurrences of each 
fime poinf were labelled wifh differenf dyes (as shown in Table 19.1a). 

This provided 12 blocks (one for each combinafion of fhe fhree inoculafion freaf- 
menfs and four biological replicafes) of 13 arrays, and each block was processed on a 
separafe day. Ignoring fhe confrolled environmenf phase of fhe design, a model for fhis 
parf could be wriffen as 

Sfrucfural componenf: (Day/Array)*Channel 

Explanatory componenf: [1] + Dye + lnoculation*Time 

This is a parfially balanced incomplefe block design, wifh fhe same 13 (ouf of fhe 78 
possible) fime poinf comparisons appearing fogefher on an array for each biological 
replicafe of each inoculafion freafmenf. Comparisons befween inoculafion freafmenfs 
are made befween days, and comparisons befween fime poinfs are made parfly wifhin 
arrays (for adjacenf fime poinfs), parfly wifhin channels and parfly befween arrays 
(ofher comparisons). Comparisons befween fhe fwo dyes are complefely confounded 



TABLE 19.1 

Allocation of Treatments to Microarrays and Dyes: 
(a) Allocation of Time Points (1-13) for Each 
Inoculation Treatment and Biological Replicate in 
Part 1; (b) Allocation of Inoculation Treatments (M, 
1 or 2) and Biological Replicates (a, b, c, d) for Each 
Time Point in Part 2 (Case Study 19.2) 
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with differences befween fhe fwo channels. Incorporafion of sfrucfure from fhe con- 
frolled environmenf phase can be achieved by adding fhis direcfly, so fhaf fhe sfruc- 
fural componenf becomes 

Sfrucfural componenf: (Cabinet/Shelf/Set) + (Day/Array)*Channel 

This sfrucfure is no longer balanced, and if any of fhe ferms from fhe differenf phases 
are complefely confounded, if is imporfanf fo include only one of fhem in fhe sfrucfural 
componenf. In fhis example, fhere is no confounding, buf fhe complexify of fhe sfruc- 
fure is greafly increased, and some ferms may be difficulf fo esfimafe. If we believe 
fhaf variafion wifhin fhe CE cabinef shelves is small and can be ignored, fhen we could 
simplify maffers by creafing a new factor fo represenf fhe biological replicafes, RepCE 
(equivalenf fo fhe Cabinet. Shelf combinafions), and use fhis term in fhe explanatory 
componenf. This uses fhe ideas of infra-block analysis (Secfion 11.6) wifhin fhe mulfi- 
phase confexf. The model can fhen be wriffen as 

Sfrucfural componenf: (Day/Array)*Channel 

Explanatory componenf: [1] -i- RepCE -i- Dye -i- lnoculation*Time 

The RepCE term is esfimafed wifhin fhe Day sfrafum, using 3 df and resulfing in only 
6 residual df for fhaf sfrafum. A dummy ANOVA fable for fhis model is shown in 
Table 19.2a. 



TABLE 19.2 

Dummy ANOVA Table for (a) Part 1 and (b) Part 2 of Microarray Design 
Stage, Adjusting for Controlled Environment Phase (Case Study 19.2) 
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The second part of the design was used to make direct comparisons between inoc- 
ulation treatments and biological replicates at each time point. This part of the design 
consisted of 13 blocks, one for the samples from each time point. Each block was 
processed on a different day and contained 12 arrays, again using a loop design, with 
the comparisons shown in Table 19.1b. Each array gave a comparison across inocula- 
tion treatments and across biological replicates within a time point, and all combina- 
tions of inoculation treatment and biological replicate appeared within each block, 
labelled by both dyes. Using the same strategy as in Part 1, the structure for this part 
of the design can be written as 

Structural component: (Day/Array)*Channel 

Explanatory component: [1] -i- Time -i- Dye -i- lnoculation*RepCE 

A dummy ANOVA table for this model is shown in Table 19.2b. If this part of the 
design is considered alone, then the effects of time are completely aliased with differ- 
ences between days. But there is information in this part of the design on comparisons 
between inoculation treatments and it is possible to check for differences in response 
to inoculation across the biological replicates. Combined analysis of the two parts of 
the design simultaneously allows these two different aspects to be combined, using 
the model 

Structural component: (Day/Array)*Channel 

Explanatory component: [1] -i- Dye -i- lnoculation*Time*RepCE 

This experiment generated a vast quantity of data, with measurements being made 
on over 32,000 genes on each array. Initial analysis was performed on a gene-by-gene 
basis, rather than trying to analyse for treatment effects across all genes simultane- 
ously. The lack of balance in the combined analysis requires the use of linear mixed 
models (Chapter 16). The careful consideration of the different constraints during each 
phase of the design, and the careful confounding of sources of variation between dif- 
ferent phases are crucial in ensuring that the analysis model is relatively easy to con- 
struct and interpret. 



Case Study 19.3: Designing a Sampling Scheme to 
Detect Variation within a Population 

Mapping populations are derived from inbred parental lines. In the case of recombi- 
nant inbred lines (RILs), the two parental lines are crossed and then the individual 
offspring is self-crossed (usually by a process of single seed descent, see Kearsey and 
Pooni, 1996) for a number of generations (usually eight or more) to produce a popula- 
tion of genetically stable offspring lines. The resulting population can be used to detect 
quantitative trait loci (QTLs, see Kearsey and Pooni, 1996), but it is helpful to first iden- 
tify traits where variation between lines is present. 

A new RIL population of 110 oilseed rape lines was grown in a glasshouse, using 
a RCBD with three replicates, with pots containing single plants as the experimental 
units. The aim of the study was to identify seed traits showing substantial varia- 
tion between lines; here, we focus on seed weight (measured in grams). Oilseed rape 
plants are structured, with pods growing on branches within plants, and branches 
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flowering in succession from the top of the plant, starting from the end of each 
branch. A pilot study was done to identify structured sources of variation within 
plants, using all three replicates of 10 lines (chosen at random) and sampling eight 
pods along each of the first, third and fifth branches. The model for the pilot study 
was written as 

Response variable: SeedWeight 

Structural component: Rep/Unit/Branch/Pod 

Explanatory component: [1] + Line*BranchNo*Position 

where Position labels the relative position of a pod within the branch, BranchNo speci- 
fies the position of a branch within a plant. Unit labels the experimental units (pots) 
within replicates and the remaining factor names are self-explanatory. The average 
seed weight per pod was analysed using ANOVA; the results gave weak evidence for 
differences between lines and suggested that seed weight might be larger on the first 
branch, and possibly also for the first pods set within each branch, but there was no 
evidence of any interactions within the explanatory model. We concluded that a sam- 
pling scheme that takes account of the structure within plants should give a more 
efficient comparison across lines. The next task was to design a more comprehensive 
study that covered a much larger number of lines. 

The constraints on the second study were that a maximum of 600 pods could be 
processed, with average seed weight per pod derived from the total weight per pod 
divided by the number of seeds per pod (usually 8-14). Using the methods of Section 
16.4.1, we derived estimated variance components for each component of the structural 
model as shown in Table 19.3. The experimental units for the lines are the individual 
pots, labelled as Rep. Unit combinations. For any structured sample, we can use the 
method of Section 16.4.1 to predict the Rep. Unit stratum variance in an ANOVA table, 
and use this to make power calculations for detecting line differences. 

We consider a generic balanced scenario. We could sample n^ lines (chosen at ran- 
dom) using replicates, taking np pods from Wg branches. We would specify the 
branches to be used and the positions within each branch to give reasonable coverage 
of the plant. We can then use terms in the explanatory model to account for the system- 
atic differences between branches and positions found in the pilot study. Our model 
for this structure hence takes the form 

Structural component: Rep/Unit/Branch/Pod 

Explanatory component: [1] -i- Line -i- BranchNo -i- Position 



TABLE 19.3 



Estimated Variance Components for Each Term in Structural 
Model for Measurements of Seed Weight (g) (Case Study 19.3) 
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and a dummy ANOVA table for this structure can be constructed, as shown in 
Table 19.4a. Using the methods of Section 16.4.1, we can predict the stratum variances 
(Table 19.4a), and our prediction of the Rep. Unit stratum variance takes the generic form 

Su = ngnpCu + Mpdfi + dp , 



where dy,d| and dp are the estimated variance components for units (pots), branches 
and pods, respectively. The variance of a line prediction is based on the Rep. Unit stra- 
tum variance divided by the replication calculated as the number of pods sampled per 
line, i.e. x x Up, giving 
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As a starting point, we assess a scenario using all three replicates (up = 3) of 50 lines 
(chosen at random from the set not used in the pilot study), taking two pods (wp = 2) 
from each of two branches (ug = 2) and giving 600 pods in total. In this case, we might 
use the first and third branches, taking the two pods from the end and middle of each 
branch to maximize coverage of the plant structure. This structure matches a subset 
of the data from the pilot study which sampled the same experiment so, as long as the 
measurement methods have not changed in the interim, we can include the subset 
from the pilot study to obtain data on 60 lines, giving 720 observations in total and the 
dummy ANOVA table shown in Table 19.4b. In this case, the background variability for 
lines is sh = 0.0263 (with 118 df) and so the SE of a line prediction is equal to 
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= 0.0468 , 



TABLE 19.4 

Dummy ANOVA Table for a Balanced Sample from the Oilseed Rape Study Using a RCBD (a) with 
Wg Replicates of Wp Lines, Taking Up Pods from Wg Branches from Each Plant, and (b) with 60 Lines 
Each with Three Replicates, Taking Two Pods from Each of Two Branches (Case Study 19.3) 
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with SED = 0.0662. In the pilot study, the average seed weight was 0.4 g and the new 
study is required to detect a change of + 20%, i.e. differences of fhe order of 0.16 g. 
Using fhe mefhods of Secfion 10.3 fo calculafe fhe power for a RCBD, we find fhaf fhe 
power of fhis design is 66.9%, i.e. giving a probabilify of 0.669 of defecfing a difference 
in seed size of 0.16 g befween fwo lines. This does nof make any allowance for adjusf- 
menfs required for mulfiple fesfing (Secfion 8.8). 

This sampling scheme ufilizes only jusf over half of fhe available lines, giving a high 
chance fhaf we mighf omif some of fhe more exfreme lines, and so we should also 
assess an alfernafive scenario fhaf samples a greafer proporfion of fhe lines. To do fhis, 
we have fo sacrifice replicafion af some ofher level in fhe sfrucfure, and fhis is besf 
done where fhe background variabilify is relafively low. The pod variance componenf 
is fhe smallesf value in Table 19.3, so suppose we sample only one pod from each of fwo 
branches (i.e. Up = 1), wifh fhe pod posifion fixed fo avoid infroducing variabilify due 
fo pod posifion. We can fhen sample fhe full sef of 100 lines nof used in fhe pilof sfudy, 
using 100 lines x 3 blocks x 2 branches x 1 pod = 600 pods. As we have overlap wifh 
fhe sfrucfure of fhe pilof sfudy, we can again incorporafe fhose dafa, giving dafa on fhe 
full sef of lines. In fhis case, fhe boffom sfrafum is removed from fhe ANOVA fable, as 
we cannof assess variafion befween pods wifhin branches when only a single pod is 
sampled from each branch; however, fhe form of fhe resf of fhe fable is unchanged. The 
Rep. Unit sfrafum variance is now esfimafed as 

si = 3di + a| + = (3 X 0.0046) + 0.0031 + 0.0017 = 0.0186 , 



wifh line predicfions having SE = 0.056 and SED = 0.079 (all wifh 218 df). This design 
has power of 52.5% for defecfing differences of size 0.16 g. Eor a reasonably small 
decrease in power, we gain informafion on our full sef of lines. In confrasf, if insfead 
we sampled fwo pods from one branch on each planf, fhen fhe Rep. Unit stratum vari- 
ance would increase to 0.0355 (with 218 df) and the power would fall to 31.0%, which 
is unacceptably low. 

Other options can be investigated in a similar manner, but the second option of 
sampling one pod from the first and third branches of the 100 lines excluded from the 
pilot study appears to give the best option for getting information on the full set of 
lines with reasonable power and precision. This case study gives an example where a 
pilot study can be used to gain useful information that can also be incorporated into 
the main study. 



19.2 Choosing the Best Analysis Approach 

Different traditions for statistical analysis have developed for the analysis of data from 
designed experiments (where there is careful control of the levels of the explanatory vari- 
ables used, usually resulting in a balanced structure) and for the analysis of data from obser- 
vational studies (where the explanatory variables are usually not under the control of the 
researcher). These differences may seem illogical since both traditions work with the same 
underlying linear model, but can be understood by considering differences in the aims of 
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the two types of study, and in the approaches used to collect the data. The biggest difference 
befween fhe fwo fradifions is in fhe approach fo building a model for fhe response. 

In fhe analysis of a designed experiment fhe model will usually have been defermined 
by fhe design of fhe experimenf. All of fhe ferms incorporafed info fhe design (in bofh fhe 
explanafory and sfrucfural componenfs of fhe model) are fitted and, for orfhogonal sfruc- 
fures, all ferms are refained in fhe model, whefher or nof fhe associafed variance rafio is 
sfafisfically significanf (see Secfion 8.2.4). This approach refains fhe residual mean square 
in each sfrafum as an esfimafe of pure error, since if is based on variafion befween unifs 
wifh fhe same freafmenf combinafion applied. If can be argued fhaf model ferms fhaf are 
nof sfafisfically significanf can legifimafely be merged wifh fhe residual; however, fhis may 
nof be frue when fhe ferm is nof significanf because fhe sfafisfical power is low. For non- 
orfhogonal explanafory sfrucfures, any non-significanf ferms would be dropped from fhe 
model so fhaf fhey do nof influence fhe model predicfions (Secfion 11.2.4). In bofh cases, 
all significanf ferms and any ferms marginal fo fhem form fhe model used for predicfion. 

In fhe analysis of dafa from observaf ional sf udies, dafa may have been gafhered on many 
differenf explanafory variables regarded as speculafive or exploratory, wifh fhe sfafisfical 
analysis being used fo screen for variables fhaf are relafed fo fhe response. In many cases, 
fhere will be sfrong correlafions wifhin fhe sef of explanafory variables and if would be 
counfer-producfive fo include fhe full sef in fhe model, and so fhe subsef of variables (and 
inferacfions) fhaf gives a good buf parsimonious descripfion of fhe response is selecfed 
(see Secfions 14.9 and 15.5). Predicfions are fhen made from fhis selecfed model. 

The differences in procedure arise from differences in fhe aims and consfrucfion of a 
sfudy. If an experimenf has been designed, giving an orfhogonal sfrucfure fo invesfigafe 
fhe effecf of cerfain explanafory variables on fhe response, fhen fhe full model is pre- 
defined and all effecfs are esfimable. In some cases, such as fhe presence of exfraneous 
covariafes, if may be sensible fo add unplanned ferms fo fhe model, buf if will nof be nec- 
essary fo drop ferms from fhe model unless fhe sfrucfure is non-orfhogonal. On fhe ofher 
hand, if a sfudy has collected dafa on a number of unconfrolled explanafory variables, 
fhen fhere is no pre-defined model and if is appropriafe fo use model selecfion fechniques 
(adding and/or dropping ferms) fo find a parsimonious sfafisfical model fhaf gives a good 
descripfion of fhe response, and fo idenfify which explanafory variables have some influ- 
ence on fhe response. 

However, wifhin each fype of sfudy, fhere are furfher analysis issues fhaf should be 
considered. 

19.2.1 Analysis of Designed Experiments 

As discussed above, fhe analysis of any designed experimenf should be defined by fhe 
design chosen for fhe experimenf. Cerfainly, fhe sfrucfural componenf of fhe model should 
be defermined by fhe design, and an inifial explanafory model can be idenfified based 
on fhe explanafory variables and sfrucfures (e.g. crossed or nesfed facforial sfrucfures) 
considered during fhe consfrucfion of fhe design. Where qualifafive factors are included, 
some refinemenf of fhe analysis may be possible using conf rasf s fo address quesf ions more 
specific fhan 'are fhe mean responses for fhe levels of fhe facfor differenf?'. Of course, fhese 
quesfions (and confrasfs) should have been idenfified and used fo selecf facfor levels dur- 
ing fhe consfrucfion of fhe design, preferably enabling fhe confrasfs idenfified fo be fiffed 
as an orfhogonal sef. If fhis is nof fhe case, care musf be faken in bofh specifying fhese 
confrasfs (ensuring fhaf fhe sfafisfical package will nof orfhogonalize fhe confrasfs during 
fhe model fitting process - a step fhaf mighf change fhe meaning of each fiffed confrasf) 
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and in interpreting the results of the individual tests (as they may change according to the 
order in which the contrasts are fitted, see Section 11.2). 

With quantitative factors, the fitting of orfhogonal polynomial confrasfs fo explore fhe 
shape of fhe response can be planned during fhe consfrucfion of fhe design, using fhe 
expecfed maximum order of fhe polynomial fo defermine fhe number and spread of 
fhe facfor levels included. Polynomial confrasfs generally provide a good assessmenf of 
whefher fhe response is linear or more complex, buf exfracfing informafion abouf fhe fif- 
fed polynomial is quife challenging in mosf sfafisfical soffware. Therefore, fhe fiffing of 
polynomial confrasfs wifhin fhe analysis of a designed experimenf is offen a precursor 
fo franslafion of fhe model info fhe regression modelling framework. This can be chal- 
lenging, as if requires fhaf proper accounf is faken of fhe sfrucfural componenf of fhe 
model (see Secfions 11.6 and 15.3), as well as ensuring fhaf all qualifafive explanatory fac- 
tors and inferacfions are refained in fhe model (see Chapfer 15). Buf analysis wifhin fhe 
regression modelling framework does make if possible fo use more complicafed models, 
such as fhe non-linear models infroduced briefly in Secfion 17.3. As an alfernafive, linear 
mixed models (Chapfer 16) allow direcf incorporafion of fhe sfrucfural componenf of fhe 
model in addifion fo regression relafionships wifhin a general explanafory sfrucfure. 

Finally, if is always imporfanf fo invesfigafe whefher fhe observed response mighf have 
been influenced by any unplanned (exfraneous) sources of variafion. These mighf be 
noficed by fhe experimenfer (e.g. pigeon grazing in one corner of a field) or defected dur- 
ing sfafisfical analysis (e.g. examining residuals according fo fheir physical posifion in fhe 
experimenfal layouf as in Figure 11.4). These effecfs can be incorporated in fhe analysis by 
including a measure of fhe unplanned quanfify for each experimenfal unif as a facfor or 
covariafe (Secfions 11.5 and 15.4). 



19.2.2 Analysis of Observational Studies 

When identifying the aims of any observational study, the primary response variable of 
interest should be determined, together with the set of potential explanatory variables to 
be measured. In most cases, some (or even all) of the explanatory variables will be quan- 
titative, and observations should be collected to cover the full range of values normally 
observed for each variable. With quantitative explanatory variables present, a regression 
model will usually be the first approach considered, although this approach should still 
take account of any structure in the study (see Section 15.3). Before analysis, it is important 
to understand the extent of correlations among the explanatory variables (see Sections 
14.1 and 14.7). One possible preliminary step is to fit simple linear regression models to 
evaluate the relationship of the primary response variable with each individual quantita- 
tive explanatory variable, and to fit a simple factor model for any qualitative explanatory 
variables. With simple data sets, this may be all that is required, but various extensions 
will usually need to be considered such as those listed below. 

• For quantitative variables, is the relationship a straight line or would some form 
of curved relationship be more appropriate? Choice of the form of curved rela- 
tionship depends on whether we just want to describe the response (polynomial 
models may be adequate) or want some deeper understanding of the underlying 
mechanism (some form of non-linear model may be more appropriate). 

• Are there multiple explanatory variables? For a small number of variables, it might 
be possible to fit and compare models for all combinations of the explanatory 
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variables (excluding interactions), but for larger numbers some sort of model 
selecfion process (such as sfepwise regression, Secfions 14.9 and 15.5) mighf be 
needed as an inifial sfep, before furfher exploring fhe models idenfified in more 
defail. 

• Are fhere likely fo be inferacfions among fhe explanatory variables? This depends 
on fhe explanatory variables presenf, and may require an undersfanding of fhe 
underlying biological science. Recall fhaf inferacfions befween quanfifafive and 
qualifafive explanafory variables (variafes and facfors) correspond fo regression 
wifh groups, and fhese regressions may be parallel or nof (Chapter 15). 



19.2.3 Different Types of Data 

Much of fhe dafa fhaf we collecf from eifher designed experimenfs or observafional sfud- 
ies will safisfy fhe assumpfions associafed wifh fhe linear model - i.e. fhaf fhe model 
deviafions are homogeneous (consfanf variabilify), fhaf fhe deviafions follow a Normal 
disfribufion, and fhaf fhe deviafions are mufually independenf (see Secfions 4.1 and 12.1). 
We have infroduced diagnosfic approaches (Chapfers 5 and 13) fhaf can be used fo check 
fhaf fhe firsf fwo of fhese assumpfions are safisfied for fhe sfafisfical analysis approaches 
covered in fhis book. The way in which fhe dafa are collected usually defermines whefher 
deviafions can reasonably be expecfed fo be independenf (see Secfion 5.2.2). If should be 
obvious when planning a sfudy whefher fhis is likely fo be an issue, and fhe dafa collecfion 
procedure can eifher be modified fo ensure independence, or a more complex sfafisfical 
analysis can be planned; for example, linear mixed models (Chapter 16) can allow for dif- 
terenf paf ferns of correlafion befween observafions. 

Where fhe assumpfions of homogeneify of variance or Normalify are nof mef, fhen one 
approach is fhe use of fransformafion of fhe dafa prior fo analysis (Chapfer 6). This pro- 
vides some challenges in fhe presenfafion of fhe resulfs (see Secfion 6.3), buf ofherwise fhe 
analysis proceeds as for an unfransformed variable. For some fypes of discrete dafa, such 
as unconsfrained counfs, or counfs expressed as a proporfion of some fixed fofal, fransfor- 
mafion is unlikely fo be successful and a better alfernafive is available. These forms of dafa 
are likely fo follow eifher Poisson (counfs) or Binomial disfribufions (proporfions), and for 
fhese fypes of response we can fif models for bofh quanfifafive and qualifafive explanafory 
variables wifhin fhe GLM framework infroduced briefly in Chapfer 18. 

Finally, sfudies mighf involve fhe collecfion of mulfiple response variables as well as 
mulfiple explanafory variables. While each response variable mighf be analysed individu- 
ally, using fhe mefhods idenfified in fhis book, if mighf be more useful fo analyse fhe sef of 
response variables fogefher, faking accounf of fhe associafions and relafionships befween 
fhem. This requires fhe applicafion of mulfivariafe sfafisfical mefhods, which are beyond 
fhe scope of fhis book. 



19.3 Presentation of Statistics in Reports, Theses and Papers 

Having carefully designed a sfudy, collecfed fhe dafa and performed an appropriate sfa- 
fisfical analysis fo address fhe quesfions and hypofheses fhaf mofivafed fhe sfudy, we 
usually need fo summarize fhe sfafisfical aspects of fhe sfudy in a reporf, fhesis or paper. 



Practical Design and Data Analysis for Real Studies 



539 



Statistical information should appear in both the Materials and Methods and the Results 
sections of any publicafion, and we discuss each of fhese separafely. 



19.3.1 Statistical Information in the Materials and Methods 

In the Materials and Methods section of a publication, the aim is to provide sufficient sta- 
tistical information to allow the reader to be able to understand the structure of the study 
and repeat the statistical analysis. 

The first step is to provide details about the design of the study. For a designed experi- 
ment this should include information about the treatments and explanatory structures 
as well as the physical structure of the experiment, identifying any practical constraints 
associated with performing the experiment. Where a standard form of design has been 
used (such as a randomized complete block design or a Latin square design), it will usu- 
ally be adequate to state the form of design by name with the level of replication (e.g. two 
replicates of a 3 x 3 Latin square design). Where a more complex or non-standard form 
of design has been used, it is necessary to provide more detail, and a diagram or table 
showing the structure of the design, possibly also including the treatment allocation, can 
be helpful (e.g. see Table 19.1). It is important to identify the form of the experimental unit 
that was used for each treatment factor, including the dimensions of the unit if this is 
otherwise unclear. For some studies, these dimensions might vary between experimental 
units (e.g. field size in Case Study 19.1), in which case it might be helpful to state the range 
of values. Where different experimental units are used for different explanatory vari- 
ables, such as in Example 9.2, it is important to indicate the number of smaller experimen- 
tal units that are combined to produce each larger experimental unit, as well as to clearly 
identify the explanatory variables that are applied to each type of experimental unit. It 
is also important to indicate the numbers of levels of each variable (the actual levels may 
have already been described as part of the biological methods, but might be usefully iden- 
tified here as well), how variables are combined (e.g. as a crossed or nested factorial struc- 
ture), whether there are additional control treatments, and, most importantly, the number 
of replicates of each treatment combination. It should be possible for the reader to relate 
the choice of experimental treatments directly to the stated aims of the study, but it may 
sometimes be useful to clarify the precise hypotheses being tested, and to indicate how 
these relate to the particular treatments or treatment combinations included. Such infor- 
mation may also be used to justify the inclusion of contrasts within a treatment factor. 

For observational studies, similar information needs to be provided, but with more 
focus on how the samples were selected, possibly identifying the sampling frame and any 
constraints on the selection of individual samples. Any broad differences in the character- 
istics of the samples (related to spatial location, time of sampling, environmental variables, 
etc.) should be reported, together with the structure of the sample. For example, in a study 
that sampled fruit from trees in an orchard for an assessment of pesticide levels, we would 
need to indicate how the fruit were selected within each tree, possibly taking account of 
different locations within the tree (e.g. branch number or spatially defined parts of the 
tree), as well as how the trees were selected within the orchard, and, if the study used fruit 
from different orchards, how the orchards were selected. Design Case Study 19.1 contains 
some elements of an observational study, as well as elements of a designed experiment, 
illustrating how studies can be a hybrid of these two types. 

Flaving described the structure and design of the study, the next part of the statisti- 
cal methods relates to the data that have been collected. Details should be given of the 
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number of measurements or assessments made on each experimental unit (we might mea- 
sure each individual plant within a plot of multiple plants, or take several measurements 
on each experimental unit), and any manipulation of these data prior to analysis. This 
would include the calculation of any quantity derived from several variables measured 
on each experimental unit (e.g. harvest index). If some transformation of the data has been 
performed prior to analysis in order to satisfy the model assumptions (primarily homoge- 
neity of variance and Normality), then the form of data transformation should be stated, 
together with the reason for the transformation. 

The final section of statistical methodology relates to the method(s) of analysis that have 
been applied to the data to extract answers to the original questions, i.e. to test the statisti- 
cal hypotheses. Where a standard method has been used, such as simple linear regression 
to assess the relationship between a response and explanatory variate or ANOVA to sum- 
marize data from a designed experiment, it will often be sufficient to just state the method 
without the need for further referencing. In other cases, it is helpful to state the full model 
fitted, including both explanatory and structural components; these can be described in 
the text (e.g. a crossed structure with two factors), but it is often clearer to explicitly give 
the symbolic form, including details of any contrasts fitted to extract information about 
particular treatment comparisons. Where model selection approaches have been used, the 
description should indicate the model terms or sequence of models being considered, as 
well as the selection strategy and selection criterion used to identify the best model. Where 
less standard analysis approaches are used, it will usually be more sensible to provide a 
reference to a good applied statistics text describing the method than to give details within 
the publication. 

In addition, it is useful to state the statistical software used to perform the analysis, 
including version number and the relevant functions or procedures. However, this infor- 
mation should be given in addition to, not instead of, the information on the methods 
listed above. It is a good practice to keep a safe copy of the analysis program and data file 
used to produce the results given in the paper, in case of any future revision or queries. 



19.3.2 Presentation of Results 

The best approach to the presentation of the results of statistical analyses, and the quantity 
of information required, varies considerably depending on the type of analysis that has 
been done. For simple statistical hypothesis tests, such as a two-sample t-test (see Section 
2.4.2), it is sufficient to present the test statistic, together with the associated degrees of 
freedom and the observed significance level, within the text. Interpretation of the test 
result (whether to reject or fail to reject the null hypothesis) might then follow, together 
with information about the mean values for the different treatments, and hence the direc- 
tion of the difference and the biological interpretation. 

For the analysis of variance of a designed experiment or the fitting of a regression 
model, more extensive results need to be presented. It is usually sufficient to identify the 
model terms used to form the predictive model (i.e. terms that are statistically signifi- 
cant and those terms marginal to them), and then to present predictions. At this stage, 
a table or a graph provides a succinct yet powerful summary of the analysis, and the 
choice between these two forms usually depends on the complexity and quantity of the 
information to be presented. Tables may be the better choice for large, more complex sets 
of means (as might be produced from a multi-factorial designed experiment) and graphs 
are often better for showing simpler patterns (such as for a simple designed experiment 
or regression model). 
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When presenting the results from the analysis of variance of a designed experiment 
if is usually nof necessary fo presenf fhe full ANOVA fable, alfhough when fhe model is 
complex, a fable showing fhe sfrafa, model ferms and fheir associafed df can be a useful 
supplemenf fo fhe descripfion in fhe Maferials and Mefhods secfion. F-fesfs for fhe vari- 
ance rafios associafed wifh each explanatory term can be quofed in fhe fexf (fesf sfafisfic 
wifh df and observed significance level), wifh fables of means or simple graphs used fo 
show fhe paffern of responses for fhe predicfive model. Where a high-order inferacfion 
term has a significanf F-fesf, fhen if is usual fo show predicfions for fhe levels of fhe associ- 
afed main effecfs alongside fhe predicfions for fhe inferacfion (i.e. for fhe combinafions of 
levels of differenf factors) in a mulfi-way fable. Careful fhoughf is needed abouf fhe choice 
of facfors fo label fhe rows and columns of fhe fable, as fhe human eye is good af seeing 
pafferns down columns of numbers, buf less good across rows. For a fwo-way fable, one 
sfrafegy is fo assign columns fo fhe factor which has larger differences befween predic- 
fions for differenf levels, wifh rows assigned fo fhe ofher factor. The more subfle differ- 
ences befween predicfions for fhe levels of fhe row factor (wifhin each level of fhe column 
facfor) are fhen more easily seen down each column. There is a nafural ordering of fhe 
levels for quanfifafive facfors, buf careful ordering of levels for qualifafive facfors can make 
if easier fo see pafferns. 

Graphs can give a good illusfrafion of fwo-facfor inferacfions, wifh fhe levels of one 
facfor labelling posifions on fhe horizonfal axis and differenf colours or symbols used 
fo indicafe fhe poinfs for fhe levels of fhe ofher facfor. Poinf plofs are preferable fo bar 
charfs, and if can be useful fo draw lines befween fhe poinfs for each level of fhe second 
facfor (as we did from Chapfer 8 onwards), alfhough if is imporfanf fo realize fhaf fhese 
lines do nof imply fhaf we can inferpolafe befween levels. Again, fhere will be a nafural 
ordering of facfor levels on fhe horizonfal axis for quanfifafive facfors buf where fhe facfor 
is qualifafive, careful ordering of fhe levels can enhance fhe inferprefafion of fhe inferac- 
fion. The besf choice of facfor used fo label fhe horizonfal axis depends on confexf, and if 
is usually worfh creafing bofh of fhe possible graphs fo idenfify fhe opfion fhaf provides 
fhe clearesf inferprefafion. Graphs can also be used fo illusfrafe fhree-way inferacfions 
by creafing separafe plofs fo illusfrafe fhe inferacfion befween fwo of fhe facfors for each 
level of fhe fhird facfor; however, if fhe pafferns are complex, fhen a fable may make bel- 
ter use of space. 

Whefher presenfing predicfions using fables or graphs, if is vifal fo also presenf informa- 
fion abouf fhe precision of fhe predicfions or of differences befween fhem. Where inferesf 
is in fhe esfimafed response for a parficular level of a facfor (a single predicfion), fhen 
fhe esfimafed SE is fhe appropriafe measure of precision, eifher as a summary or used fo 
consfrucf a confidence inferval (usually af fhe 95% level). If fhe predicfions are presenfed 
graphically, fhen bars for confidence infervals (or SEs) can be added fo fhe poinf for each 
predicfion, wifh fhe form of error bar and ifs associafed df clearly sfafed in fhe figure 
legend. Eor a balanced design wifh equal replicafion, fhe SEs will be fhe same, and so 
only fhis common value needs fo be presenfed wifh a fable of predicfions. Flowever, mosf 
sfudies are more concerned wifh assessing differences befween pairs of predicfions, and 
so fhe presenfafion of fhe esfimafed SE of fhe difference (SED) for each comparison, wifh 
ifs associafed df, is more useful. Alfernafively, fhe LSD can be derived for any comparison 
and presenfed wifh ifs associafed df and chosen level of significance sfafed. Again, for a 
balanced design wifh equal replicafion, fhere will be only one SED or LSD value fo be pre- 
senfed and, in fhis case, if is besf fo jusf add a single SED or LSD bar fo a graph of predicfed 
values, suifably posifioned fo allow easy visual assessmenf of fhe comparison(s) of mosf 
inferesf (e.g. Eigure 4.6). 
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More care is required where a transformation has been applied to the data prior to anal- 
ysis (see Chapter 6). Here, the analysis has been applied to the transformed response, so 
that assessment of the significance of differences between predictions must be made on 
that transformed scale using the appropriate SEDs or LSDs. However, we often want to 
interpret these differences on the scale on which the data were originally measured. The 
usual approach is to present both the predictions on the transformed scale, with appropri- 
ate SEDs/LSDs (plus associated df), and the back-transformed predictions. This is rela- 
tively easy when presenting the predictions in a table, as the back-transformed values 
can be presented in parentheses alongside the predictions. Eor graphical presentation, one 
option is to plot the predictions (with SEDs or LSDs) on the transformed scale but with 
the vertical axis labelled on the back-transformed scale. Where this is not possible, or does 
not provide a clear representation of the pattern of response, it may be better to present 
the back-transformed means graphically with the means on the transformed scale (and 
their associated precision) shown in an accompanying table. Where interest is in single 
predictions, so that confidence intervals give an appropriate measure of precision, then 
confidence intervals can be calculated on the transformed scale and both the mean value 
and the confidence limits back-transformed for graphical display. 

Eor a regression model with a single explanatory variate, a graph of the fitted model 
imposed on a scatter plot of the observations will usually be helpful to demonstrate 
the fit of the model. This is a useful approach for a linear (Chapter 12) or non-linear 
(Chapter 17) model, as well as for models with additional explanatory factor(s) (Chapter 
15). The equation of the fitted model should be presented alongside the graph, showing 
the estimated parameters and their standard errors (with the associated df), and appro- 
priate goodness-of-fit statistics (Section 14.8). A graphical representation of the fitted 
model is more difficult for multiple regression models, but a plot of the observations 
against the fitted values can be helpful. Eor models with a large number of parameters, 
presentation of the estimated parameters and their SE in a table may be more effective 
than trying to list them within the text (e.g. Table 14.14). Comparisons between models 
within a nested sequence (Chapters 14 and 15) can be reported in terms of the E-tests 
from the sequential ANOVA table, presenting each test statistic with its df and observed 
significance level. 

Issues in the presentation of results from fitting GLMs (Chapter 18) closely mirror those 
associated with the analyses of transformed data, with parameters being estimated on the 
scale defined by the link transformation. Results are best presented in terms of predictions 
on the back-transformed scale for qualitative variables or plots of the fitted model on the 
back-transformed scale for quantitative variables, with information about the significance 
of parameter estimates, or comparisons between predictions, presented on the scale of the 
link transformation. 

Einally, we return briefly to the issue of the precision with which numerical values should 
be presented, initially discussed in Section 2.6. The conventions described there provide a 
set of guidelines for the presentation of numerical results. We introduced the convention 
that test statistics, critical values and observed significance levels should be presented to 
three decimal places, generally providing sufficient detail for interpretation of the tests. 
We also introduced the concept of identifying the granularity of the observed response, 
and that other statistics should be presented with a precision defined in terms of this 
granularity. Predictions and estimated parameters should be presented to one more signif- 
icant figure than the granularity of the original data, and variances, standard deviations 
and standard errors (including LSDs) should use two more significant figures. Eollowing 
these guidelines, with a sprinkling of common sense thrown in, should ensure that you 
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present your numerical results with sufficient detail to allow your reader to understand 
and appreciate the analysis and interpretation you present. 



19.4 And Finally... 

We hope that we have provided a clear understanding of a wide range of sfafisfical mefh- 
ods fo underpin your scienfific research and educafion, and have provided you wifh suf- 
ficienf knowledge fo allow you eifher fo design your own sfudies and analyse fhe dafa fhaf 
you collecf, or fo be able fo insfigafe a fruifful collaborafion wifh a professional applied 
sfafisfician. We have all worked in organizafions in fhe laffer role and are well aware of fhe 
value, enjoymenf and success fhaf such collaborafions can bring fo bofh sides. 

Probably fhe mosf valuable single piece of advice we can give is fhaf you should look 
crifically af your sfafisfical analysis, as if is easy fo fif sfafisfical models fhaf do nof make 
biological sense and which may fherefore give misleading resulfs. Always make sure fhaf 
you undersfand fhe model you have fitted and cross-check fhaf fhe resulfs are consisfenf 
wifh simple summaries of your sfudy. If your resulfs confradicf previous work, fhen check 
for a misfake in your dafa processing or analysis procedures before celebrafing your new 
discovery! 

Our infenfion is fo mainfain and expand fhe online software resources associafed wifh 
fhis book (www.sfafs4biol.info), and hope fhaf you confinue fo find fhese and fhe confenfs 
of fhe book useful as you pursue fhe applicafion of sfafisfical approaches fo add value fo 
your research and sfudy. 
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Appendix A: Data Tables 



TABLE A.1 



Measurements of Weight (W), Length (L), Diameter (D), Moisture Content (M) and Hardness Index 
(H) for 190 Seeds (Example 12.1A and File triticum.dat) 



Seed 


W 


L 


D 


M 


H 


Seed 


W 


L 


D 


M 


H 


1 


30.15 


3.27 


2.09 


10.27 


- 16.63 


39 


25.42 


3.01 


1.93 


10.68 


- 22.04 


2 


35.51 


3.65 


2.34 


10.61 


- 8.27 


40 


30.28 


3.45 


2.21 


10.37 


2.25 


3 


29.16 


3.36 


2.15 


10.27 


- 21.45 


41 


27.41 


3.30 


2.11 


10.41 


- 3.94 


4 


16.82 


2.77 


1.79 


11.05 


4.13 


42 


38.75 


3.84 


2.47 


10.68 


- 20.41 


5 


23.42 


2.78 


1.80 


10.02 


- 2.05 


43 


19.69 


3.14 


2.01 


10.28 


- 11.05 


6 


31.77 


3.37 


2.15 


10.34 


- 41.78 


44 


24.80 


3.09 


1.98 


10.67 


- 30.84 


7 


16.45 


2.52 


1.66 


10.64 


- 5.33 


45 


33.27 


3.39 


2.17 


10.79 


- 21.12 


8 


32.89 


3.48 


2.23 


10.44 


- 13.91 


46 


22.43 


2.91 


1.87 


10.47 


- 28.66 


9 


22.55 


3.17 


2.03 


10.28 


- 10.87 


47 


49.47 


4.12 


2.66 


10.59 


- 42.47 


10 


28.03 


3.20 


2.05 


10.22 


- 16.28 


48 


22.30 


3.07 


1.97 


10.97 


- 1.61 


11 


32.27 


3.58 


2.29 


10.32 


- 12.81 


49 


27.29 


3.42 


2.19 


10.37 


11.24 


12 


40.62 


3.97 


2.56 


10.40 


10.46 


50 


34.26 


3.63 


2.33 


10.39 


- 4.45 


13 


29.28 


3.54 


2.27 


10.64 


- 32.43 


51 


24.30 


3.06 


1.96 


10.85 


11.87 


14 


22.68 


3.23 


2.07 


10.78 


- 19.04 


52 


24.55 


3.24 


2.07 


10.30 


- 21.16 


15 


29.78 


3.53 


2.26 


10.39 


- 25.78 


53 


19.06 


2.89 


1.86 


10.25 


- 9.72 


16 


27.16 


3.05 


1.96 


10.49 


- 34.65 


54 


27.04 


3.18 


2.04 


10.36 


- 6.46 


17 


17.94 


2.86 


1.85 


10.37 


- 5.24 


55 


29.03 


3.36 


2.15 


10.67 


- 8.63 


18 


20.93 


3.08 


1.97 


10.97 


- 6.41 


56 


36.38 


3.64 


2.33 


10.59 


- 19.23 


19 


30.78 


3.48 


2.23 


10.83 


- 4.09 


57 


24.30 


3.22 


2.06 


10.78 


- 0.93 


20 


45.85 


3.78 


2.43 


10.37 


- 18.00 


58 


22.68 


3.11 


1.99 


10.38 


- 22.53 


21 


30.78 


3.27 


2.09 


10.70 


- 3.21 


59 


33.89 


3.51 


2.25 


10.64 


- 5.04 


22 


33.64 


3.54 


2.27 


10.52 


- 21.18 


60 


33.39 


3.45 


2.21 


10.65 


- 18.14 


23 


34.89 


3.47 


2.22 


10.74 


- 18.36 


61 


25.54 


3.45 


2.21 


10.28 


7.83 


24 


22.55 


3.09 


1.98 


10.55 


- 9.35 


62 


32.52 


3.67 


2.35 


10.51 


- 13.93 


25 


28.28 


3.38 


2.16 


10.51 


- 7.74 


63 


31.27 


3.41 


2.18 


10.30 


- 24.13 


26 


25.04 


3.07 


1.97 


10.06 


- 7.46 


64 


31.15 


3.40 


2.17 


10.48 


3.35 


27 


39.25 


3.80 


2.44 


10.76 


- 28.34 


65 


19.19 


2.93 


1.89 


10.59 


- 5.89 


28 


26.79 


3.09 


1.98 


10.26 


6.31 


66 


23.42 


3.22 


2.06 


10.34 


- 1.76 


29 


24.55 


3.10 


1.99 


10.36 


4.18 


67 


25.17 


2.87 


1.85 


10.54 


- 20.72 


30 


33.02 


3.58 


2.29 


10.56 


- 22.34 


68 


28.91 


3.10 


1.99 


10.56 


- 11.31 


31 


24.30 


2.74 


1.78 


10.54 


- 19.77 


69 


27.79 


3.38 


2.16 


10.45 


- 6.73 


32 


25.92 


3.10 


1.99 


10.67 


- 28.64 


70 


34.76 


3.45 


2.21 


10.70 


- 36.98 


33 


34.51 


3.57 


2.29 


10.61 


- 9.14 


71 


29.53 


3.24 


2.07 


10.05 


- 20.59 


34 


28.16 


3.00 


1.93 


10.21 


- 15.51 


72 


24.80 


3.30 


2.11 


10.54 


- 7.83 


35 


28.16 


3.09 


1.98 


10.52 


- 11.11 


73 


29.65 


3.31 


2.12 


10.52 


- 25.70 


36 


19.81 


2.85 


1.84 


10.94 


- 14.08 


74 


29.90 


3.36 


2.15 


10.42 


- 34.76 


37 


27.16 


3.39 


2.17 


10.37 


- 19.96 


75 


21.81 


2.82 


1.82 


10.71 


- 8.90 


38 


16.07 


2.61 


1.71 


10.31 


- 5.11 


76 


25.29 


3.12 


2.00 


10.77 


15.07 
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Appendix A 



TABLE A.1 (continued) 



Measurements of Weight (W), Length (L), Diameter (D), Moisture Content (M) and Hardness Index 
(H) for 190 Seeds (Example 12.1A and File triticum.dat) 



Seed 


W 


L 


D 


M 


H 


Seed 


W 


L 


D 


M 


H 


77 


33.77 


3.43 


2.19 


10.61 


- 24.60 


121 


31.90 


3.33 


2.13 


10.56 


- 5.52 


78 


37.13 


3.81 


2.45 


10.70 


- 15.98 


122 


24.05 


2.73 


1.77 


10.77 


- 30.30 


79 


31.52 


3.43 


2.19 


10.55 


- 14.59 


123 


35.88 


3.63 


2.33 


10.65 


- 17.86 


80 


32.65 


3.48 


2.23 


10.50 


- 22.41 


124 


41.12 


3.83 


2.46 


10.70 


- 8.30 


81 


28.16 


3.55 


2.27 


10.79 


- 42.05 


125 


30.90 


3.71 


2.38 


10.88 


- 25.06 


82 


25.67 


3.33 


2.13 


10.65 


8.38 


126 


23.67 


3.18 


2.04 


10.65 


- 1.61 


83 


16.07 


2.46 


1.63 


10.69 


- 4.47 


127 


22.43 


2.92 


1.88 


10.54 


- 11.17 


84 


31.90 


3.34 


2.14 


10.71 


- 14.63 


128 


26.79 


3.23 


2.07 


10.59 


- 23.49 


85 


40.99 


4.00 


2.58 


10.68 


- 2.88 


129 


43.49 


3.96 


2.55 


10.67 


- 30.67 


86 


22.68 


3.19 


2.04 


10.20 


6.77 


130 


27.66 


3.42 


2.19 


10.65 


- 7.37 


87 


24.55 


3.17 


2.03 


10.90 


- 19.53 


131 


30.28 


3.49 


2.23 


10.70 


4.89 


88 


31.77 


3.61 


2.31 


10.62 


- 9.93 


132 


21.81 


3.20 


2.05 


10.91 


1.78 


89 


30.90 


3.63 


2.33 


10.51 


- 10.35 


133 


33.52 


3.46 


2.21 


10.30 


- 21.35 


90 


29.65 


3.56 


2.28 


10.66 


16.00 


134 


17.19 


2.81 


1.82 


10.72 


- 17.29 


91 


33.39 


3.34 


2.14 


10.38 


- 21.67 


135 


15.57 


2.64 


1.72 


10.67 


- 22.46 


92 


27.54 


3.49 


2.23 


10.29 


- 10.27 


136 


23.80 


3.10 


1.99 


10.34 


- 20.90 


93 


35.14 


3.82 


2.45 


10.24 


4.35 


137 


35.26 


3.60 


2.31 


10.77 


- 14.34 


94 


37.13 


3.64 


2.33 


10.21 


- 31.31 


138 


47.47 


4.13 


2.67 


10.72 


- 27.76 


95 


19.06 


2.94 


1.89 


10.48 


22.05 


139 


25.17 


2.90 


1.87 


10.28 


26.26 


96 


31.65 


3.31 


2.12 


10.01 


- 10.07 


140 


41.99 


3.86 


2.48 


10.38 


- 20.44 


97 


25.92 


3.01 


1.93 


10.55 


- 4.62 


141 


29.16 


3.62 


2.32 


10.57 


- 7.22 


98 


25.17 


3.09 


1.98 


10.77 


- 19.43 


142 


18.57 


3.08 


1.97 


10.54 


- 26.78 


99 


21.06 


2.90 


1.87 


10.61 


- 6.36 


143 


24.17 


3.31 


2.12 


10.98 


- 24.60 


100 


27.66 


3.12 


2.00 


10.54 


- 5.06 


144 


35.88 


3.66 


2.34 


10.14 


- 1.33 


101 


32.27 


3.15 


2.02 


10.70 


- 34.37 


145 


20.56 


2.90 


1.87 


10.85 


2.23 


102 


22.93 


2.84 


1.83 


10.21 


- 6.84 


146 


30.28 


3.44 


2.20 


10.26 


- 21.36 


103 


29.41 


3.03 


1.94 


10.62 


- 10.74 


147 


28.41 


3.35 


2.14 


10.39 


- 16.57 


104 


31.03 


3.54 


2.27 


10.43 


1.11 


148 


24.80 


2.88 


1.86 


10.44 


- 14.64 


105 


25.54 


3.23 


2.07 


10.52 


- 29.74 


149 


28.53 


3.45 


2.21 


10.58 


- 10.61 


106 


32.77 


3.36 


2.15 


10.49 


11.20 


150 


32.52 


3.46 


2.21 


10.71 


- 8.90 


107 


34.89 


3.64 


2.33 


10.90 


- 14.96 


151 


26.91 


2.97 


1.91 


10.63 


- 6.45 


108 


22.93 


2.87 


1.85 


10.97 


- 20.46 


152 


25.42 


3.01 


1.93 


10.57 


- 38.91 


109 


35.26 


3.54 


2.27 


10.58 


- 5.39 


153 


33.27 


3.61 


2.31 


10.49 


- 0.79 


110 


21.81 


3.16 


2.02 


10.53 


- 14.53 


154 


29.16 


3.34 


2.14 


10.36 


- 18.54 


111 


31.27 


3.49 


2.23 


10.61 


- 31.34 


155 


25.17 


3.09 


1.98 


10.74 


- 28.62 


112 


32.02 


3.50 


2.24 


10.78 


- 20.25 


156 


28.53 


3.17 


2.03 


10.52 


- 21.63 


113 


29.41 


3.49 


2.23 


10.25 


- 9.70 


157 


20.68 


2.92 


1.88 


10.81 


- 47.75 


114 


25.04 


3.11 


1.99 


10.34 


- 10.69 


158 


23.42 


2.76 


1.79 


10.42 


- 7.09 


115 


37.50 


3.83 


2.46 


10.70 


- 4.61 


159 


27.41 


3.06 


1.96 


10.98 


- 24.94 


116 


17.94 


2.74 


1.78 


10.84 


- 30.76 


160 


22.18 


3.06 


1.96 


10.68 


- 21.83 


117 


32.15 


3.26 


2.09 


10.74 


- 11.28 


161 


28.03 


3.22 


2.06 


10.29 


- 19.64 


118 


26.29 


3.33 


2.13 


10.57 


- 21.02 


162 


35.01 


3.36 


2.15 


10.66 


- 22.98 


119 


27.29 


3.24 


2.07 


10.59 


0.42 


163 


29.03 


3.10 


1.99 


10.45 


- 22.79 


120 


32.15 


3.47 


2.22 


10.34 


- 19.32 


164 


23.92 


2.95 


1.90 


10.82 


- 10.84 
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TABLE A.1 (continued) 

Measurements of Weight (W), Length (L), Diameter (D), Moisture Content (M) and Hardness Index 
(H) for 190 Seeds (Example 12.1A and File triticum.dat) 



Seed 


W 


L 


D 


M 


H 


Seed 


W 


L 


D 


M 


H 


165 


26.54 


3.32 


2.12 


10.88 


-15.84 


178 


37.38 


3.67 


2.35 


10.73 


-17.74 


166 


36.63 


3.59 


2.30 


10.50 


-5.22 


179 


35.01 


3.65 


2.34 


10.34 


-15.01 


167 


31.90 


3.51 


2.25 


10.42 


-8.95 


180 


28.03 


3.30 


2.11 


10.83 


-5.60 


168 


22.80 


2.90 


1.87 


10.85 


-5.56 


181 


34.14 


3.51 


2.25 


10.56 


-4.95 


169 


28.28 


3.44 


2.20 


10.46 


-20.97 


182 


26.79 


3.34 


2.14 


10.55 


-19.84 


170 


39.12 


3.69 


2.37 


10.08 


-19.53 


183 


38.13 


3.65 


2.34 


10.31 


-27.10 


171 


28.16 


3.51 


2.25 


10.19 


-27.23 


184 


31.90 


3.37 


2.15 


10.40 


-16.98 


172 


38.88 


3.71 


2.38 


10.64 


-17.95 


185 


33.64 


3.56 


2.28 


10.73 


-34.58 


173 


23.92 


3.07 


1.97 


10.34 


-3.73 


186 


27.29 


3.04 


1.95 


10.55 


-7.29 


174 


20.81 


2.90 


1.87 


10.54 


-1.51 


187 


27.66 


3.60 


2.31 


10.88 


-22.68 


175 


29.03 


3.22 


2.06 


10.36 


-6.98 


188 


26.54 


3.58 


2.29 


10.49 


3.30 


176 


19.69 


3.02 


1.94 


10.38 


3.78 


189 


30.90 


3.17 


2.03 


10.37 


-17.83 


177 


33.27 


3.47 


2.22 


10.48 


-14.60 


190 


18.94 


2.45 


1.62 


10.08 


-7.06 



Source: Data from H.-C. Jing and K. Hammond-Kosack, Rothamsted Research. 
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TABLE A.2 



Measurements of Air Temperature (°C) for 100 Days during 2006 from a Standard Glass Mercury 
Thermometer (M) and a New Electronic Thermistor (ET) (Exercise 12.2 and Eile airtemp.dat) 



Day 


M 


ET 


Day 


M 


ET 


Day 


M 


ET 


1 


5.3 


5.3 


127 


12.1 


12.4 


244 


17.9 


17.4 


7 


6.6 


5.5 


132 


12.6 


12.7 


246 


20.7 


19.5 


8 


8.9 


8.7 


134 


11.7 


11.4 


251 


20.3 


19.6 


13 


6.9 


6.7 


139 


11.3 


11.9 


253 


18.4 


18.3 


15 


8.8 


8.6 


141 


12.0 


11.6 


258 


16.3 


16.6 


20 


2.0 


1.8 


148 


10.0 


10.3 


260 


17.0 


17.5 


22 


0.5 


- 0.1 


153 


13.6 


13.6 


261 


21.0 


20.5 


27 


3.9 


3.8 


155 


19.1 


19.3 


265 


16.0 


16.1 


29 


- 0.5 


- 0.7 


160 


28.4 


23.0 


267 


15.2 


15.9 


34 


4.0 


3.8 


162 


14.3 


13.8 


272 


15.6 


14.9 


36 


5.5 


4.6 


167 


17.5 


17.8 


274 


12.1 


12.3 


41 


8.2 


8.0 


169 


15.0 


15.3 


279 


16.9 


15.8 


43 


9.3 


00 


174 


12.7 


12.9 


281 


15.5 


15.6 


48 


4.2 


3.4 


176 


16.4 


16.7 


286 


12.5 


12.7 


50 


2.0 


2.0 


181 


24.1 


24.2 


288 


14.5 


14.5 


55 


3.1 


2.7 


183 


23.0 


22.7 


293 


11.0 


10.9 


57 


0.5 


- 0.2 


188 


18.5 


19.0 


294 


11.0 


10.9 


62 


4.0 


2.7 


190 


19.9 


20.0 


295 


11.5 


10.9 


64 


9.4 


8.7 


195 


24.9 


23.9 


300 


12.5 


12.0 


69 


1.2 


0.7 


197 


27.9 


26.4 


302 


4.6 


4.3 


71 


4.1 


3.5 


202 


23.0 


22.5 


307 


6.6 


5.8 


76 


4.3 


4.0 


204 


26.9 


25.0 


309 


9.0 


8.1 


78 


3.6 


2.7 


209 


20.5 


20.4 


314 


14.7 


14.3 


83 


10.9 


11.0 


211 


16.3 


17.2 


316 


12.4 


12.2 


85 


7.9 


7.6 


216 


16.6 


16.8 


321 


11.8 


11.2 


90 


8.4 


8.4 


217 


17.3 


18.3 


323 


5.0 


4.8 


92 


4.5 


4.5 


218 


16.9 


17.2 


330 


6.1 


4.7 


97 


4.8 


5.0 


223 


14.2 


14.3 


335 


8.8 


8.1 


99 


8.9 


8.5 


225 


18.8 


17.6 


337 


8.5 


8.6 


106 


10.1 


10.4 


230 


15.9 


16.1 


344 


10.6 


10.4 


111 


8.4 


8.5 


232 


17.3 


17.6 


349 


4.9 


4.0 


113 


11.9 


11.7 


239 


14.8 


14.7 


351 


- 1.5 


- 2.4 


120 


13.9 


14.2 


241 


16.3 


16.7 


353 


0.0 


- 3.2 


125 


11.8 


12.0 















Source: Data from T. Scott and M. Glendinmg, Rothamsted Research. 
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TABLE A.3A (continued) 

Data from 50 Suction Trap Locations (Trap) During 1995: Julian Day When Aphid Myzus persicae First Caught in Trap (JDay), Trap Location 
(Latitude, Longitude and Altitude) and Monthly Rainfall from October 1994 to May 1995 (Example 14.2 and File examine.dat) 

Location Monthly Rain 

Trap JDay Latitude Longitude Altitude October November December January February March April 
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49 123 51.20 0.94 43 107.2 33.3 92.5 137.4 80.1 61.8 17.2 26.8 

50 141 50.32 1T55 240 153 20T 393 323 253 34.1 66.1 65.4 

Source: Data from R. Harrington, Rothamsted Research. 
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Data from 50 Suction Trap Locations (Trap) During 1995: Mean Temperature for the Coldest 30-Day Period (C30Day) and the Following 60-Day 
Period (F60Day), and Proportion of Land in a Circle of Radius 75 km around the Trap Identified as Coniferous, Deciduous or Mixed Forest, 
Grassland, Arable Crops, Inland Waters, Sea or Urban Use (Example 14.2 and File examine.dat) 

Land Use 
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TABLE A.3B (continued) 

Data from 50 Suction Trap Locations (Trap) During 1995: Mean Temperature for the Coldest 30-Day Period (C30Day) and the Following 60-Day 
Period (F60Day), and Proportion of Land in a Circle of Radius 75 km around the Trap Identified as Coniferous, Deciduous or Mixed Forest, 
Grassland, Arable Crops, Inland Waters, Sea or Urban Use (Example 14.2 and File examine.dat) 

Land Use 
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Appendix B: Quantiles of Statistical 
Distributions 



TABLE B.1 

95th Percentiles of F-Distribution with N Numerator and D Denominator df 



N 



D 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


1 


161.448 


199.500 


215.707 


224.583 


230.162 


233.986 


236.768 


238.883 


240.543 


241.882 


2 


18.513 


19.000 


19.164 


19.247 


19.296 


19.330 


19.353 


19.371 


19.385 


19.396 


3 


10.128 


9.552 


9.277 


9.117 


9.013 


8.941 


8.887 


8.845 


8.812 


8.785 


4 


7.709 


6.944 


6.591 


6.388 


6.256 


6.163 


6.094 


6.041 


5.999 


5.964 


5 


6.608 


5.786 


5.409 


5.192 


5.050 


4.950 


4.876 


4.818 


4.772 


4.735 


6 


5.987 


5.143 


4.757 


4.534 


4.387 


4.284 


4.207 


4.147 


4.099 


4.060 


7 


5.591 


4.737 


4.347 


4.120 


3.972 


3.866 


3.787 


3.726 


3.677 


3.637 


8 


5.318 


4.459 


4.066 


3.838 


3.687 


3.581 


3.500 


3.438 


3.388 


3.347 


9 


5.117 


4.256 


3.863 


3.633 


3.482 


3.374 


3.293 


3.230 


3.179 


3.137 


10 


4.965 


4.103 


3.708 


3.478 


3.326 


3.217 


3.135 


3.072 


3.020 


2.978 


11 


4.844 


3.982 


3.587 


3.357 


3.204 


3.095 


3.012 


2.948 


2.896 


2.854 


12 


4.747 


3.885 


3.490 


3.259 


3.106 


2.996 


2.913 


2.849 


2.796 


2.753 


13 


4.667 


3.806 


3.411 


3.179 


3.025 


2.915 


2.832 


2.767 


2.714 


2.671 


14 


4.600 


3.739 


3.344 


3.112 


2.958 


2.848 


2.764 


2.699 


2.646 


2.602 


15 


4.543 


3.682 


3.287 


3.056 


2.901 


2.790 


2.707 


2.641 


2.588 


2.544 


16 


4.494 


3.634 


3.239 


3.007 


2.852 


2.741 


2.657 


2.591 


2.538 


2.494 


17 


4.451 


3.592 


3.197 


2.965 


2.810 


2.699 


2.614 


2.548 


2.494 


2.450 


18 


4.414 


3.555 


3.160 


2.928 


2.773 


2.661 


2.577 


2.510 


2.456 


2.412 


19 


4.381 


3.522 


3.127 


2.895 


2.740 


2.628 


2.544 


2.477 


2.423 


2.378 


20 


4.351 


3.493 


3.098 


2.866 


2.711 


2.599 


2.514 


2.447 


2.393 


2.348 


22 


4.301 


3.443 


3.049 


2.817 


2.661 


2.549 


2.464 


2.397 


2.342 


2.297 


24 


4.260 


3.403 


3.009 


2.776 


2.621 


2.508 


2.423 


2.355 


2.300 


2.255 


26 


4.225 


3.369 


2.975 


2.743 


2.587 


2.474 


2.388 


2.321 


2.265 


2.220 


28 


4.196 


3.340 


2.947 


2.714 


2.558 


2.445 


2.359 


2.291 


2.236 


2.190 


30 


4.171 


3.316 


2.922 


2.690 


2.534 


2.421 


2.334 


2.266 


2.211 


2.165 


32 


4.149 


3.295 


2.901 


2.668 


2.512 


2.399 


2.313 


2.244 


2.189 


2.142 


34 


4.130 


3.276 


2.883 


2.650 


2.494 


2.380 


2.294 


2.225 


2.170 


2.123 


36 


4.113 


3.259 


2.866 


2.634 


2.477 


2.364 


2.277 


2.209 


2.153 


2.106 


38 


4.098 


3.245 


2.852 


2.619 


2.463 


2.349 


2.262 


2.194 


2.138 


2.091 


40 


4.085 


3.232 


2.839 


2.606 


2.449 


2.336 


2.249 


2.180 


2.124 


2.077 


45 


4.057 


3.204 


2.812 


2.579 


2.422 


2.308 


2.221 


2.152 


2.096 


2.049 


50 


4.034 


3.183 


2.790 


2.557 


2.400 


2.286 


2.199 


2.130 


2.073 


2.026 


55 


4.016 


3.165 


2.773 


2.540 


2.383 


2.269 


2.181 


2.112 


2.055 


2.008 


60 


4.001 


3.150 


2.758 


2.525 


2.368 


2.254 


2.167 


2.097 


2.040 


1.993 


70 


3.978 


3.128 


2.736 


2.503 


2.346 


2.231 


2.143 


2.074 


2.017 


1.969 


80 


3.960 


3.111 


2.719 


2.486 


2.329 


2.214 


2.126 


2.056 


1.999 


1.951 


100 


3.936 


3.087 


2.696 


2.463 


2.305 


2.191 


2.103 


2.032 


1.975 


1.927 
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TABLE B.1 (continued) 

95th Percentiles of F-Distribution with N Numerator and D Denominator df 

N 



D 


11 


12 


13 


14 


15 


16 


18 


20 


22 


24 


1 


242.984 


243.906 


244.690 


245.364 


245.950 


246.464 


247.323 


248.013 


248.579 


249.052 


2 


19.405 


19.413 


19.419 


19.438 


19.443 


19.447 


19.454 


19.460 


19.464 


19.468 


3 


8.763 


8.745 


8.729 


8.715 


8.703 


8.692 


8.674 


8.660 


8.648 


8.638 


4 


5.936 


5.912 


5.891 


5.873 


5.858 


5.844 


5.821 


5.802 


5.787 


5.774 


5 


4.704 


4.678 


4.655 


4.636 


4.619 


4.604 


4.578 


4.558 


4.541 


4.527 


6 


4.027 


4.000 


3.976 


3.956 


3.938 


3.922 


3.896 


3.874 


3.856 


3.841 


7 


3.603 


3.575 


3.550 


3.529 


3.511 


3.494 


3.467 


3.444 


3.426 


3.410 


8 


3.313 


3.284 


3.259 


3.237 


3.218 


3.202 


3.173 


3.150 


3.131 


3.115 


9 


3.102 


3.073 


3.048 


3.025 


3.006 


2.989 


2.960 


2.936 


2.917 


2.900 


10 


2.943 


2.913 


2.887 


2.865 


2.845 


2.828 


2.798 


2.774 


2.754 


2.737 


11 


2.818 


2.788 


2.761 


2.739 


2.719 


2.701 


2.671 


2.646 


2.626 


2.609 


12 


2.717 


2.687 


2.660 


2.637 


2.617 


2.599 


2.568 


2.544 


2.523 


2.505 


13 


2.635 


2.604 


2.577 


2.554 


2.533 


2.515 


2.484 


2.459 


2.438 


2.420 


14 


2.565 


2.534 


2.507 


2.484 


2.463 


2.445 


2.413 


2.388 


2.367 


2.349 


15 


2.507 


2.475 


2.448 


2.424 


2.403 


2.385 


2.353 


2.328 


2.306 


2.288 


16 


2.456 


2.425 


2.397 


2.373 


2.352 


2.333 


2.302 


2.276 


2.254 


2.235 


17 


2.413 


2.381 


2.353 


2.329 


2.308 


2.289 


2.257 


2.230 


2.208 


2.190 


18 


2.374 


2.342 


2.314 


2.290 


2.269 


2.250 


2.217 


2.191 


2.168 


2.150 


19 


2.340 


2.308 


2.280 


2.256 


2.234 


2.215 


2.182 


2.155 


2.133 


2.114 


20 


2.310 


2.278 


2.250 


2.225 


2.203 


2.184 


2.151 


2.124 


2.102 


2.082 


22 


2.259 


2.226 


2.198 


2.173 


2.151 


2.131 


2.098 


2.071 


2.048 


2.028 


24 


2.216 


2.183 


2.155 


2.130 


2.108 


2.088 


2.054 


2.027 


2.003 


1.984 


26 


2.181 


2.148 


2.119 


2.094 


2.072 


2.052 


2.018 


1.990 


1.966 


1.946 


28 


2.151 


2.118 


2.089 


2.064 


2.041 


2.021 


1.987 


1.959 


1.935 


1.915 


30 


2.126 


2.092 


2.063 


2.037 


2.015 


1.995 


1.960 


1.932 


1.908 


1.887 


32 


2.103 


2.070 


2.040 


2.015 


1.992 


1.972 


1.937 


1.908 


1.884 


1.864 


34 


2.084 


2.050 


2.021 


1.995 


1.972 


1.952 


1.917 


1.888 


1.863 


1.843 


36 


2.067 


2.033 


2.003 


1.977 


1.954 


1.934 


1.899 


1.870 


1.845 


1.824 


38 


2.051 


2.017 


1.988 


1.962 


1.939 


1.918 


1.883 


1.853 


1.829 


1.808 


40 


2.038 


2.003 


1.974 


1.948 


1.924 


1.904 


1.868 


1.839 


1.814 


1.793 


45 


2.009 


1.974 


1.945 


1.918 


1.895 


1.874 


1.838 


1.808 


1.783 


1.762 


50 


1.986 


1.952 


1.921 


1.895 


1.871 


1.850 


1.814 


1.784 


1.759 


1.737 


55 


1.968 


1.933 


1.903 


1.876 


1.852 


1.831 


1.795 


1.764 


1.739 


1.717 


60 


1.952 


1.917 


1.887 


1.860 


1.836 


1.815 


1.778 


1.748 


1.722 


1.700 


70 


1.928 


1.893 


1.863 


1.836 


1.812 


1.790 


1.753 


1.722 


1.696 


1.674 


80 


1.910 


1.875 


1.845 


1.817 


1.793 


1.772 


1.734 


1.703 


1.677 


1.654 


100 


1.886 


1.850 


1.819 


1.792 


1.768 


1.746 


1.708 


1.676 


1.650 


1.627 



Appendix B 



561 



TABLE B.1 (continued) 

95th Percentiles of F-Distribution with N Numerator and D Denominator df 

N 



D 


26 


28 


30 


35 


40 


45 


50 


60 


80 


100 


1 


249.453 


249.797 


250.095 


250.693 


251.143 


251.494 


251.774 


252.196 


252.724 


253.041 


2 


19.472 


19.474 


19.477 


19.482 


19.485 


19.488 


19.491 


19.494 


19.498 


19.501 


3 


8.630 


8.623 


8.617 


8.604 


8.594 


8.587 


8.581 


8.572 


8.561 


8.554 


4 


5.763 


5.754 


5.746 


5.729 


5.717 


5.707 


5.699 


5.688 


5.673 


5.664 


5 


4.515 


4.505 


4.496 


4.477 


4.464 


4.453 


4.444 


4.431 


4.415 


4.405 


6 


3.829 


3.818 


3.808 


3.789 


3.774 


3.763 


3.754 


3.740 


3.722 


3.712 


7 


3.397 


3.386 


3.376 


3.356 


3.340 


3.328 


3.319 


3.304 


3.286 


3.275 


8 


3.101 


3.090 


3.079 


3.058 


3.043 


3.030 


3.020 


3.005 


2.986 


2.975 


9 


2.886 


2.874 


2.864 


2.842 


2.826 


2.813 


2.803 


2.787 


2.767 


2.755 


10 


2.723 


2.710 


2.700 


2.678 


2.661 


2.648 


2.637 


2.621 


2.601 


2.588 


11 


2.594 


2.582 


2.570 


2.548 


2.531 


2.517 


2.506 


2.490 


2.469 


2.456 


12 


2.491 


2.478 


2.466 


2.443 


2.426 


2.412 


2.401 


2.384 


2.363 


2.350 


13 


2.405 


2.392 


2.380 


2.357 


2.339 


2.325 


2.314 


2.297 


2.275 


2.261 


14 


2.333 


2.320 


2.308 


2.284 


2.266 


2.252 


2.241 


2.223 


2.200 


2.187 


15 


2.272 


2.259 


2.247 


2.223 


2.204 


2.190 


2.178 


2.160 


2.137 


2.123 


16 


2.220 


2.206 


2.194 


2.169 


2.151 


2.136 


2.124 


2.106 


2.083 


2.068 


17 


2.174 


2.160 


2.148 


2.123 


2.104 


2.089 


2.077 


2.058 


2.035 


2.020 


18 


2.134 


2.119 


2.107 


2.082 


2.063 


2.048 


2.035 


2.017 


1.993 


1.978 


19 


2.098 


2.084 


2.071 


2.046 


2.026 


2.011 


1.999 


1.980 


1.955 


1.940 


20 


2.066 


2.052 


2.039 


2.013 


1.994 


1.978 


1.966 


1.946 


1.922 


1.907 


22 


2.012 


1.997 


1.984 


1.958 


1.938 


1.922 


1.909 


1.889 


1.864 


1.849 


24 


1.967 


1.952 


1.939 


1.912 


1.892 


1.876 


1.863 


1.842 


1.816 


1.800 


26 


1.929 


1.914 


1.901 


1.874 


1.853 


1.837 


1.823 


1.803 


1.776 


1.760 


28 


1.897 


1.882 


1.869 


1.841 


1.820 


1.803 


1.790 


1.769 


1.742 


1.725 


30 


1.870 


1.854 


1.841 


1.813 


1.792 


1.775 


1.761 


1.740 


1.712 


1.695 


32 


1.846 


1.830 


1.817 


1.789 


1.767 


1.750 


1.736 


1.714 


1.686 


1.669 


34 


1.825 


1.809 


1.795 


1.767 


1.745 


1.728 


1.713 


1.691 


1.663 


1.645 


36 


1.806 


1.790 


1.776 


1.748 


1.726 


1.708 


1.694 


1.671 


1.643 


1.625 


38 


1.790 


1.774 


1.760 


1.731 


1.708 


1.691 


1.676 


1.653 


1.624 


1.606 


40 


1.775 


1.759 


1.744 


1.715 


1.693 


1.675 


1.660 


1.637 


1.608 


1.589 


45 


1.743 


1.727 


1.713 


1.683 


1.660 


1.642 


1.626 


1.603 


1.573 


1.554 


50 


1.718 


1.702 


1.687 


1.657 


1.634 


1.615 


1.599 


1.576 


1.544 


1.525 


55 


1.698 


1.681 


1.666 


1.636 


1.612 


1.593 


1.577 


1.553 


1.521 


1.501 


60 


1.681 


1.664 


1.649 


1.618 


1.594 


1.575 


1.559 


1.534 


1.502 


1.481 


70 


1.654 


1.637 


1.622 


1.591 


1.566 


1.546 


1.530 


1.505 


1.471 


1.450 


80 


1.634 


1.617 


1.602 


1.570 


1.545 


1.525 


1.508 


1.482 


1.448 


1.426 


100 


1.607 


1.589 


1.573 


1.541 


1.515 


1.494 


1.477 


1.450 


1.415 


1.392 
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TABLE B.2 

Percentiles of t- and Chi-Squared Distributions with D df 

t-Distribution Chi-Squared Distribution 



D 


95th 


97.5th 


99th 


99.5th 


95th 


97.5th 


99th 


99.5th 


1 


6.314 


12.706 


31.821 


63.657 


3.841 


5.024 


6.635 


7.879 


2 


2.920 


4.303 


6.965 


9.925 


5.991 


7.378 


9.210 


10.597 


3 


2.353 


3.182 


4.541 


5.841 


7.815 


9.348 


11.345 


12.838 


4 


2.132 


2.776 


3.747 


4.604 


9.488 


11.143 


13.277 


14.860 


5 


2.015 


2.571 


3.365 


4.032 


11.070 


12.833 


15.086 


16.750 


6 


1.943 


2.447 


3.143 


3.707 


12.592 


14.449 


16.812 


18.548 


7 


1.895 


2.365 


2.998 


3.499 


14.067 


16.013 


18.475 


20.278 


8 


1.860 


2.306 


2.896 


3.355 


15.507 


17.535 


20.090 


21.955 


9 


1.833 


2.262 


2.821 


3.250 


16.919 


19.023 


21.666 


23.589 


10 


1.812 


2.228 


2.764 


3.169 


18.307 


20.483 


23.209 


25.188 


11 


1.796 


2.201 


2.718 


3.106 


19.675 


21.920 


24.725 


26.757 


12 


1.782 


2.179 


2.681 


3.055 


21.026 


23.337 


26.217 


28.300 


13 


1.771 


2.160 


2.650 


3.012 


22.362 


24.736 


27.688 


29.819 


14 


1.761 


2.145 


2.624 


2.977 


23.685 


26.119 


29.141 


31.319 


15 


1.753 


2.131 


2.602 


2.947 


24.996 


27.488 


30.578 


32.801 


16 


1.746 


2.120 


2.583 


2.921 


26.296 


28.845 


32.000 


34.267 


17 


1.740 


2.110 


2.567 


2.898 


27.587 


30.191 


33.409 


35.718 


18 


1.734 


2.101 


2.552 


2.878 


28.869 


31.526 


34.805 


37.156 


19 


1.729 


2.093 


2.539 


2.861 


30.144 


32.852 


36.191 


38.582 


20 


1.725 


2.086 


2.528 


2.845 


31.410 


34.170 


37.566 


39.997 


22 


1.717 


2.074 


2.508 


2.819 


33.924 


36.781 


40.289 


42.796 


24 


1.711 


2.064 


2.492 


2.797 


36.415 


39.364 


42.980 


45.559 


26 


1.706 


2.056 


2.479 


2.779 


38.885 


41.923 


45.642 


48.290 


28 


1.701 


2.048 


2.467 


2.763 


41.337 


44.461 


48.278 


50.993 


30 


1.697 


2.042 


2.457 


2.750 


43.773 


46.979 


50.892 


53.672 


32 


1.694 


2.037 


2.449 


2.738 


46.194 


49.480 


53.486 


56.328 


34 


1.691 


2.032 


2.441 


2.728 


48.602 


51.966 


56.061 


58.964 


36 


1.688 


2.028 


2.434 


2.719 


50.998 


54.437 


58.619 


61.581 


38 


1.686 


2.024 


2.429 


2.712 


53.384 


56.896 


61.162 


64.181 


40 


1.684 


2.021 


2.423 


2.704 


55.758 


59.342 


63.691 


66.766 


42 


1.682 


2.018 


2.418 


2.698 


58.124 


61.777 


66.206 


69.336 


44 


1.680 


2.015 


2.414 


2.692 


60.481 


64.201 


68.710 


71.893 


46 


1.679 


2.013 


2.410 


2.687 


62.830 


66.617 


71.201 


74.437 


48 


1.677 


2.011 


2.407 


2.682 


65.171 


69.023 


73.683 


76.969 


50 


1.676 


2.009 


2.403 


2.678 


67.505 


71.420 


76.154 


79.490 


55 


1.673 


2.004 


2.396 


2.668 


73.311 


77.380 


82.292 


85.749 


60 


1.671 


2.000 


2.390 


2.660 


79.082 


83.298 


88.379 


91.952 


65 


1.669 


1.997 


2.385 


2.654 


84.821 


89.177 


94.422 


98.105 


70 


1.667 


1.994 


2.381 


2.648 


90.531 


95.023 


100.425 


104.215 


75 


1.665 


1.992 


2.377 


2.643 


96.217 


100.839 


106.393 


110.286 


80 


1.664 


1.990 


2.374 


2.639 


101.879 


106.629 


112.329 


116.321 


85 


1.663 


1.988 


2.371 


2.635 


107.522 


112.393 


118.236 


122.325 


90 


1.662 


1.987 


2.368 


2.632 


113.145 


118.136 


124.116 


128.299 


100 


1.660 


1.984 


2.364 


2.626 


124.342 


129.561 


135.807 


140.169 



Appendix C: Statistical and Mathematical 
Results 



C.l Derivation of Least Squares Estimates for a Model 
with a Single Factor 

For a set of N observations with a single explanatory factor, we represent the data as 
i = l ... t,k = l ... My, where yj^. is the fcth observation for the ;th treatment, t is the number of 
treatments and Wy is the number of replicates of the ;th treatment, with N = Wj + M 2 + • • • + Wf. 
The model (Equation 4.1) is written as 

yjk — M-y + ^jk / 

where Py is the unknown population mean for the ;th treatment, and is the deviation 
from that population mean for the kth observation on that treatment. We can write the 
residual sum of squares (Section 4.2) as a function of the estimated population means: 



ResSS(pi ... A,) = y,y,(y,r - Ay) 



j=l k=l 



We use a standard mathematical approach to find the estimates that minimize this func- 
tion. At any local minimum of a continuous function, its first derivative will be equal to 
zero and its second derivative will be positive. We therefore take the first derivative of the 
ResSS function with respect to each of the estimates, set the resulting equations equal to 
zero, and solve them to obtain estimates that minimize the ResSS. We can verify that we 
have found a minimum by calculating the second derivative of the ResSS function at these 
estimates. 

The first derivative of the ResSS function with respect to A/ is 



3ResSS 

aAy 



= ~'2^iy,k - Ay) ■ 

k=l 



If we set this equation equal to zero and solve for Ay, we find 

Uj rij rtj nj 

0 — — 2^^(l/yj; — Ay) “ y jk + 2^^ Ay ~ y jk + 2UyAy , 

r=i k=i 

which can be rearranged to give a unique solution as 



k=l 






= -z^yi^ = y>- 

^ k^l 
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To check that we have found a minimum, we calculate the second derivative as 



3'ResSS 

3A; 



= 2n, 



which is positive as required, and in fact is constant (and hence positive everywhere). The 
set of esfimafes fhaf minimize fhe ResSS are hence fhe sef of freafmenf sample means. For 
further details, Kuehl (2000) presents a simple demonstration for fhe CRD and Searle (1982) 
shows a complefe derivafion for any linear model using mafrix nofafion (mafrix nofafion 
is infroduced in Secfion 15.6.1). 



C.2 Partitioning the Total Sum of Squares for a Model 
with a Single Factor 

In Section 4.3.1, we saw that the total sum of squares takes the form 

f 

TofSS = ^^(y^it-y)" • 

y-1 k=i 

This formula can be expanded, without any change in its value, by subtraction and then 
addition of each group mean to give 

t t 

Totss = - Vj. + Vj. - yf = - Vi-) + iVi- - y)f ■ 

;=1 k=l j=l k=l 

We then make use of the following relationship for two quantities A and B: 

(A + Bf =(A + B)(A + B) = A^ + 2AB + B^ . 

We now substitute A = pj. and B = pj. - p into this expression to get 



f «/ 

TotSS = ^^liPjk - Pj.) + iPj. - p)f 

}=\ k=l 
t 

= '^'^iiyiK - yi-f + 2(y;vc - y;.)(y;. ~y)+ {yj. - yf] 

7=1 k=l 

t «/ t t nj 

= '^'^(y,-k - yi-f + ~ 

7=1 fc=l 7=1 k=l 7=1 k=l 
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We will consider each of the components of Equation C.l in turn. The first component 
is equal to the residual sum of squares (ResSS, Section 4.3.1). The third component is the 
treatment sum of squares (TrtSS, Section 4.3.1), which can be rewritten as 

t f 

TrtSS = -yf= -yf . 

,=i k=\ j=i 

We now look at the second component of Equation C.l, which can be rewritten as 

t "i t f 

- y;-)(w -y) = ^'^liyi- - y)^iyjk - w) 

7=1 k=l 7=1 [ k=l 

We can perform summation over the k index first (for each value of /), to give 



'^iyjk - yi-) = '^yjk - = yt- - njyj. = y,. - yj. = 0 



The second component of Equation C.l is therefore also equal to zero, leaving the result as 
required, i.e. 



'^'^(yjk - yf = ^^{yi- - yf + ^^iyjk - y,.? , 

j=l k=l j=l k=l ;=1 k=l 

which is equivalent to 



TotSS = TrtSS + ResSS . 

The same result holds for any linear model, and the principle of this proof still holds 
although the details become more complicated when there are more terms in the model. 



C.3 Derivation of Least Squares Estimates for a Model 
with a Single Variate 

Eor a set of N observations with a single explanatory variate, we represent the data as i/„ for 
i = 1 ... N, where y, is the ith observation. The model (Equation 12.1) is written as 

y, = a + Px, + e, , 

where a is the intercept and P is the slope of the straight line relationship, x, is the value of 
the explanatory variate and e, is the deviation from the straight line for the fth observation. 
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We can write the residual sum of squares (Section 12.2) as a function of paramefer esfi- 
mafes in fhe form 



N 

ResSS(d, P) = ^(y, - d - ^x,f . 

i=\ 

We use fhe same approach as Secfion C.l. We fake fhe firsf derivafive of fhe ResSS funcfion 
wifh respecf fo each of fhe esfimafes, sef fhe resulfing equafions equal fo zero, and solve 
fhem fo obfain esfimafes fhaf minimize fhe ResSS. We can verify fhaf we have found a 
minimum by calculafing fhe second derivafive af fhese esfimafes. 

The firsf derivafive of fhe ResSS funcfion wifh respecf fo d is 

N N N 

- d - px,) = -2£y, + 2Nd + 2p£x, . 

(=1 i=l f=l 

If we set this equation equal to zero and solve for d, we find 

N N 

Nd = ^y,- - p^x, , 

;=i !=i 



which can be rearranged fo give a unique solufion as 

d = y - px . 

The firsf derivafive of fhe ResSS funcfion wifh respecf fo P is 



3ResSS 

ap 



-2^ x,{y, - d - px,) = -2^x,y,- + 2d^ x, + 2P^ 



xf . 



If we sef fhis equafion equal fo zero and solve for p, we find 

N N N 






(=1 i=l 



At this point, we need to substitute for our estimate d, as we cannot have both estimates 
defined in terms of the other. This gives 



N N 



N N 



P^^? = ^x,yt - (y - Px)^x, = ^x,y; - Nxy + N^x^ 



Z=1 



i=l i=l 



We need to group terms with [3 together, to get the revised form 



/ N 



'^xf-Nx^ 



\ i=l 



N 

^x,y, - Nxy . 

Z = 1 
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We can then use the following identity for sums of squares fo simplify fhe expressions: 

N N 

^(y. - y){x, - x) =^y,x, -Nyx . 

z=l 



This relationship holds for any two variables, giving 



p 



'^x,yi-Nxy ^(x, - x)(y, - y) 
y xf - Nx^ ^^(x,- - x)(x; - x) 



SS 






as in Secfion 12.2. In bofh cases, fhe second derivafive is posifive as required. 



C.4 Variances and Standard Errors of Linear Combinations 
of Random Variables 

For a set of m random variables ... Y,„, the variance of a linear combination Z, where 



m 

Z + . . . + , 

z=l 



is calculated as 



771 m m 

Var(Z) = ^a?Var(i;) + a,ajCov{Yi,Yj) . 

!=1 i=l ;=!+l 

The following results can be derived directly: 

Var(Yi + Y2) = Var(Yi) + VarlYz) + 2Cov(Yi,Y2) 

Var(Yi - Y2) = Var(Yi) + Var(Y2) - 2Cov(Yi,Y2) 

Var(Yi + Y2 + Y3) = Var(Yi) + Var(Y2) + Var(Y3) + 2Cov(Yi,Y2) 
+ 2Cov(Yi,Y3) + 2Cov(Y2,Y3) 



For the mean of m variables, Y 




Y,, we have a, 



— for i=l ... m. Then 
m 



Var(Y) 



1 



m 



Vara) 

Z =1 







XZcovaa,). 



z=l ;=z+l 
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If the m variables are independent (zero covariance) with variance equal to this gives 



Var(Y) 





m 



C.5 Matrix Addition and Multiplication 

A matrix A of size p x is an array of numbers with p rows and q columns. We write the 

j = l ... q, for example a 2 x 3 matrix A takes 

Ai2 Ai3 
A 22 A 22 , _ 

A vector is a special case of a matrix with one column {q = 1). Two matrices with the same 
dimensions (same number of rows and same number of columns) can be added together 
via addition of their corresponding elements. So, for matrices A and B of the same size, 
matrix C = A + B means that C,y = The matrix product operation is more complex, 

and we denote it with the compound symbol The product of two matrices A and B can 
be formed as AB = A *+ B only if the number of columns of A is equal to the number of 
rows in B. For apxq matrix A and aqxr matrix B, we can form their product as C = A *+ B 
with matrix C, of size p x r, having elements defined as 

9 

Cy = ^AikBkj for i = 1 ... p, ;■ = 1 ... r . 



elements of matrix A as for i = l ... p an< 
the form 



A = 



All 

A21 



This is sometimes described as taking the vector product of a row of matrix A with a col- 
umn of matrix B. 



Statistics 



Written in simple language with relevant examples, Statistical Methods in 
Biology: Design and Analysis of Experiments and Regression is a practi- 
cal and illustrative guide to the design of experiments and data analysis in the 
biological and agricultural sciences. The book presents statistical ideas in the 
context to which they are being applied, drawing on relevant examples from the 
authors’ experience. 

Taking a practical and intuitive approach, the book includes mathematical for- 
mulae only where this helps to formalise and explain the methods being applied, 
providing extended discussions of examples based on real data sets arising from 
scientific research. The authors analyse data in detail to illustrate the use of basic 
formulae for simple examples while using statistical packages for more complex 
examples. The associated website (www.stats4biol.info) shows how to obtain 
the example analyses in the GenStat®, R and SAS® statistical packages. This on- 
line material provides a basic introduction to the facilities in each package, with 
code for all of the examples and half of the exercises in each chapter. 

By the time you reach the end of the book and online material you will have 
gained: 

• A clear appreciation of the importance of a statistical approach to the 
design of your experiments, 

• A sound understanding of the statistical methods used to analyse data 
obtained from designed experiments, and of the regression approaches 
used to construct simple models to describe the observed response as a 
function of explanatory variables, 

• Knowledge of how to use statistical packages to analyse data with the 
approaches described, and most importantly, 

• An appreciation of how to interpret the results of these statistical analyses 
in the context of the biological or agricultural science within which you are 
working. 

The book concludes with a practical guide to design and data analysis. Overall, 
it gives you the statistical understanding required to successfully identify and ap- 
ply these statistical methods to add value to your scientific research. 
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